NFL QB Performance Data Analysis¶
Executive Summary¶
This project analyzes NFL play-by-play data from 2019-2023 to move beyond traditional metrics and build a multi-faceted definition of "clutch" and "elite" quarterback performance. The analysis reveals that a quarterback's value is best understood through a combination of situational performance, longitudinal trends, and data-driven archetypes.
The analysis framework first establishes that while 4th quarter comebacks are notable, a quarterback's ability to elevate their play on high-leverage downs (specifically 3rd Down) is a more consistent indicator of success. We then expand beyond single-season snapshots with a time-series analysis, which highlights that elite players distinguish themselves by consistently performing above the league average over multiple years.
The capstone of the project is the deployment of an unsupervised KMeans clustering model, which successfully segments players into three distinct archetypes: "Elite Quarterbacks," "The League Core," and "Struggling & Backups." This machine learning approach provides the most crucial insight: "elite" status is not just about accuracy or aggressiveness, but a rare combination of both. This project delivers a robust, data-driven framework for evaluating quarterbacks in the moments that matter most, providing a significant competitive advantage in player evaluation and team strategy.
Table of Contents¶
1. Data Loading & Cleaning¶
- Initial data loading, handling of null values, and feature engineering.
2. Exploratory Data Analysis (EDA)¶
- 2.1 4QC & GWD Analysis
- Analysis of traditional "clutch" metrics: 4th Quarter Comebacks and Game-Winning Drives.
- 2.2 Performance by Down
- Examining the performance delta between 1st and 3rd Downs.
- 2.3 Time-Series Performance Analysis
- Visualizing passer rating trends over time versus the league average.
3. Machine Learning: QB Archetype Analysis¶
- 3.1 Clustering with KMeans
- Using the Elbow Method and interpreting clusters.
- 3.2 Interactive Archetype Visualization
- Mapping QB archetypes on an interactive chart.
4. Predictive Modeling¶
- (Your original section for predictive modeling)
5. Synthesis & Recommendations¶
import sqlite3
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from collections import defaultdict
import nfl_data_py as nfl
import plotly.io as pio
pio.renderers.default = 'notebook_connected'
1. Data Loading & Cleaning¶
df_2023_passing = pd.read_csv('../nfl_2023_passing1.csv')
df_2023_passing.describe
<bound method NDFrame.describe of Rk Player Age Team Pos G GS QBrec \
0 1 Tua Tagovailoa 25.0 MIA QB 17.0 17.0 11-6-0
1 CPoY-5 TagoTu00 NaN NaN NaN NaN NaN NaN
2 2 Jared Goff 29.0 DET QB 17.0 17.0 12-5-0
3 3 Dak Prescott 30.0 DAL QB 17.0 17.0 12-5-0
4 MVP-2AP OPoY-5 PresDa01 NaN NaN NaN NaN NaN NaN
.. ... ... ... ... ... ... ... ...
125 113 Garrett Wilson 23.0 NYJ WR 17.0 17.0 NaN
126 114 Christian Kirk 27.0 JAX WR 12.0 12.0 NaN
127 115 Ja'Marr Chase 23.0 CIN WR 16.0 16.0 NaN
128 NaN League NaN NaN NaN NaN NaN NaN
129 Average NaN NaN NaN NaN NaN NaN NaN
Cmp Att ... QBR Sk Yds.1 Sk% NY/A ANY/A 4QC GWD \
0 388.0 560.0 ... 60.8 29.0 171.00 4.92 7.56 7.48 2.0 2.0
1 NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN
2 407.0 605.0 ... 60.3 30.0 197.00 4.72 6.89 6.99 2.0 3.0
3 410.0 590.0 ... 72.7 39.0 255.00 6.20 6.77 7.28 2.0 3.0
4 NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN
.. ... ... ... ... ... ... ... ... ... ... ...
125 0.0 1.0 ... 3.2 0.0 0.00 0.00 0.00 0.00 0.0 0.0
126 1.0 2.0 ... 3.1 0.0 0.00 0.00 -0.50 -0.50 0.0 0.0
127 1.0 1.0 ... 1.8 0.0 0.00 0.00 -7.00 -7.00 0.0 0.0
128 NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN
129 NaN 64.5 ... NaN NaN 7.15 6.04 5.82 NaN NaN NaN
Awards Player-additional
0 PBAP NaN
1 NaN NaN
2 NaN GoffJa00
3 PBAP-2AP NaN
4 NaN NaN
.. ... ...
125 NaN WilsGa00
126 NaN KirkCh01
127 PB ChasJa00
128 NaN NaN
129 -9999 NaN
[130 rows x 34 columns]>
Analyzing the description of another dataset from a CSV file, sourced from PFR data.
Cross-referencing statistics for verification.
Loading and cross-referencing below as well.
Player Seasonal Data (2023)¶
Sourced from pro-football-reference.com
# A quick analysis into the seasonal data provided from PFR.
nfl.import_seasonal_pfr('pass', [2023]).head()
| player | team | pass_attempts | throwaways | spikes | drops | drop_pct | bad_throws | bad_throw_pct | season | ... | on_tgt_throws | on_tgt_pct | rpo_plays | rpo_yards | rpo_pass_att | rpo_pass_yards | rpo_rush_att | rpo_rush_yards | pa_pass_att | pa_pass_yards | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 541 | Tua Tagovailoa | MIA | 560.0 | 14.0 | 2.0 | 24.0 | 4.4 | 78.0 | 14.3 | 2023 | ... | 430.0 | 79.0 | 111.0 | 1073.0 | 105.0 | 1069.0 | 1.0 | 4.0 | 126.0 | 1145.0 |
| 542 | Jared Goff | DET | 605.0 | 27.0 | 4.0 | 35.0 | 6.1 | 87.0 | 15.2 | 2023 | ... | 461.0 | 80.3 | 23.0 | 189.0 | 22.0 | 186.0 | 1.0 | 3.0 | 151.0 | 1415.0 |
| 543 | Dak Prescott | DAL | 590.0 | 10.0 | 0.0 | 38.0 | 6.6 | 68.0 | 11.7 | 2023 | ... | 479.0 | 82.6 | 89.0 | 696.0 | 80.0 | 671.0 | 5.0 | 25.0 | 100.0 | 613.0 |
| 544 | Josh Allen | BUF | 579.0 | 28.0 | 2.0 | 31.0 | 5.6 | 78.0 | 14.2 | 2023 | ... | 427.0 | 77.8 | 83.0 | 675.0 | 70.0 | 637.0 | 9.0 | 38.0 | 92.0 | 929.0 |
| 545 | Brock Purdy | SF | 444.0 | 12.0 | 3.0 | 9.0 | 2.1 | 70.0 | 16.3 | 2023 | ... | 324.0 | 75.5 | 24.0 | 236.0 | 20.0 | 227.0 | 1.0 | 9.0 | 93.0 | 969.0 |
5 rows × 28 columns
nfl.import_seasonal_pfr('pass', [2023]).describe
<bound method NDFrame.describe of player team pass_attempts throwaways spikes drops drop_pct \
541 Tua Tagovailoa MIA 560.0 14.0 2.0 24.0 4.4
542 Jared Goff DET 605.0 27.0 4.0 35.0 6.1
543 Dak Prescott DAL 590.0 10.0 0.0 38.0 6.6
544 Josh Allen BUF 579.0 28.0 2.0 31.0 5.6
545 Brock Purdy SF 444.0 12.0 3.0 9.0 2.1
.. ... ... ... ... ... ... ...
640 Kadarius Toney KC 1.0 0.0 0.0 0.0 0.0
641 Kyle Trask TB 1.0 0.0 0.0 1.0 100.0
642 Garrett Wilson NYJ 1.0 0.0 0.0 0.0 0.0
643 Christian Kirk JAX 2.0 0.0 0.0 0.0 0.0
644 Ja'Marr Chase CIN 1.0 0.0 0.0 0.0 0.0
bad_throws bad_throw_pct season ... on_tgt_throws on_tgt_pct \
541 78.0 14.3 2023 ... 430.0 79.0
542 87.0 15.2 2023 ... 461.0 80.3
543 68.0 11.7 2023 ... 479.0 82.6
544 78.0 14.2 2023 ... 427.0 77.8
545 70.0 16.3 2023 ... 324.0 75.5
.. ... ... ... ... ... ...
640 1.0 100.0 2023 ... 0.0 0.0
641 0.0 0.0 2023 ... 1.0 100.0
642 1.0 100.0 2023 ... 0.0 0.0
643 1.0 50.0 2023 ... 1.0 50.0
644 0.0 0.0 2023 ... 1.0 100.0
rpo_plays rpo_yards rpo_pass_att rpo_pass_yards rpo_rush_att \
541 111.0 1073.0 105.0 1069.0 1.0
542 23.0 189.0 22.0 186.0 1.0
543 89.0 696.0 80.0 671.0 5.0
544 83.0 675.0 70.0 637.0 9.0
545 24.0 236.0 20.0 227.0 1.0
.. ... ... ... ... ...
640 3.0 14.0 0.0 0.0 3.0
641 0.0 0.0 0.0 0.0 0.0
642 0.0 0.0 0.0 0.0 0.0
643 1.0 -1.0 1.0 -1.0 0.0
644 2.0 -15.0 1.0 -7.0 1.0
rpo_rush_yards pa_pass_att pa_pass_yards
541 4.0 126.0 1145.0
542 3.0 151.0 1415.0
543 25.0 100.0 613.0
544 38.0 92.0 929.0
545 9.0 93.0 969.0
.. ... ... ...
640 14.0 1.0 0.0
641 0.0 0.0 0.0
642 0.0 0.0 0.0
643 0.0 1.0 0.0
644 -8.0 0.0 0.0
[104 rows x 28 columns]>
# Analyzing the columns in 'import_seasonal_pfr'
nfl.import_seasonal_pfr('pass', [2023]).columns
Index(['player', 'team', 'pass_attempts', 'throwaways', 'spikes', 'drops',
'drop_pct', 'bad_throws', 'bad_throw_pct', 'season', 'pfr_id',
'pocket_time', 'times_blitzed', 'times_hurried', 'times_hit',
'times_pressured', 'pressure_pct', 'batted_balls', 'on_tgt_throws',
'on_tgt_pct', 'rpo_plays', 'rpo_yards', 'rpo_pass_att',
'rpo_pass_yards', 'rpo_rush_att', 'rpo_rush_yards', 'pa_pass_att',
'pa_pass_yards'],
dtype='object')
Analyzing PFR(Pro-Football-Reference) API.¶
I am searching for more relevant terms and data.
'rpo' = Run-Pass Option
'pa' = Play Action Plays
'REG' = Regular Season
'POST' = Post-Season / Playoffs
Note : Player 'Joshua Dobbs' played for 2 teams, that's why his team representation is '2TM'. The two teams he played for were the Arizona Cardinals ('ARI'), and the Minnesota Vikings ('MIN').
# Creating a variable for the dataset, and checking for the total NFL Pass Attempts in 2023(REG).
df_seasonal_pfr = nfl.import_seasonal_pfr('pass', [2023])
df_passing_att = df_seasonal_pfr['pass_attempts'].sum()
print(f"Total pass attempts for the whole season for all QB's:\n{df_passing_att:,.0f}")
# Calculating the average pass attempts for NFL Pass Attempts in 2023 (REG).
avg_df_passing_att = df_passing_att / df_seasonal_pfr['pass_attempts'].value_counts().sum()
print(f"Average pass attempts for the whole season for all QB's:\n{avg_df_passing_att:,.0f}")
# Showcasing the DataFrame for visual reference.
df_seasonal_pfr[['player', 'team', 'pass_attempts']].head(10)
Total pass attempts for the whole season for all QB's: 18,315 Average pass attempts for the whole season for all QB's: 176
| player | team | pass_attempts | |
|---|---|---|---|
| 541 | Tua Tagovailoa | MIA | 560.0 |
| 542 | Jared Goff | DET | 605.0 |
| 543 | Dak Prescott | DAL | 590.0 |
| 544 | Josh Allen | BUF | 579.0 |
| 545 | Brock Purdy | SF | 444.0 |
| 546 | Patrick Mahomes | KC | 597.0 |
| 547 | Jordan Love | GB | 579.0 |
| 548 | C.J. Stroud | HOU | 499.0 |
| 549 | Baker Mayfield | TB | 566.0 |
| 550 | Trevor Lawrence | JAX | 564.0 |
4QC & GWD:¶
In football statistics, 4QC stands for Fourth Quarter Comeback.¶
4QC:¶
A quarterback or a team is credited with a 4QC if they meet the following criteria:
Win or Tie: The team must ultimately win or tie the game.
Trailing in the 4th Quarter/OT: The team must have an offensive scoring drive while trailing the opponent at some point in the fourth quarter or overtime.
Scoring Drive Concludes in 4th Quarter/OT: The scoring drive that ties or wins the game for the team must conclude in the fourth quarter or overtime.
Offensive Scoring Play: The tying or winning points must be a result of an offensive drive.
Distinction from a Game-Winning Drive (GWD):¶
A Game-Winning Drive (GWD) is slightly different from a 4QC.
A GWD only requires the team to win the game and have possession of the ball while tied or down by a single score (1-8 points) in the fourth quarter or overtime. The scoring drive must conclude in the fourth quarter or overtime and be the result of an offensive drive.
Essentially, a 4QC focuses on a team overcoming a deficit in the final quarter (or overtime) and securing a win or tie, while a GWD focuses on the specific drive that results in a victory when the score is tied or within a one-score margin in the late stages of the game.
Note: While these metrics are important to note, this is by far in no means to base your QB decision making solely off of these metrics.
Parsing PFR's 'Standard QB Passing' Dataset:¶
# To view all columns displayed in datasets going forward
pd.set_option('display.max_columns', 500)
# Dropping some columns from our dataset
clean_v1 = df_2023_passing.drop(columns=['Rk', 'Awards', 'Player-additional'])
print(clean_v1.head(2))
Player Age Team Pos G GS QBrec Cmp Att Cmp% \
0 Tua Tagovailoa 25.0 MIA QB 17.0 17.0 11-6-0 388.0 560.0 69.3
1 TagoTu00 NaN NaN NaN NaN NaN NaN NaN NaN NaN
Yds TD TD% Int Int% 1D Succ% Lng Y/A AY/A Y/C Y/G \
0 4624.0 29.0 5.2 14.0 2.5 222.0 50.8 78.0 8.3 8.17 11.9 272.0
1 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
Rate QBR Sk Yds.1 Sk% NY/A ANY/A 4QC GWD
0 101.1 60.8 29.0 171.0 4.92 7.56 7.48 2.0 2.0
1 NaN NaN NaN NaN NaN NaN NaN NaN NaN
Cleaning the dataset here and throughout the next following kernels.
NULL Values came from improper QB naming.
# Checking for 'NaN'/NULL values in our table.
clean_v1.isnull().sum().head()
Player 1 Age 13 Team 13 Pos 13 G 13 dtype: int64
# Cleaning NULL values, resetting the indexes.
df_clean_v1 = clean_v1.dropna().reset_index(drop=True)
NULL Values initially in our dataset are displayed above.
Created a variable for a clean, updated version of our dataset. See verification code for this step below.
# Checking if the dataset was cleaned, dropping the rows containing NULL values.
print(df_clean_v1.isnull().sum().head())
Player 0 Age 0 Team 0 Pos 0 G 0 dtype: int64
# The whole overall table that is cleaned with new indexes. We now have all of the real QB stats to start parsing through.
df_clean_v1.head().sort_values(by='Cmp%', ascending=False).reset_index()
| index | Player | Age | Team | Pos | G | GS | QBrec | Cmp | Att | Cmp% | Yds | TD | TD% | Int | Int% | 1D | Succ% | Lng | Y/A | AY/A | Y/C | Y/G | Rate | QBR | Sk | Yds.1 | Sk% | NY/A | ANY/A | 4QC | GWD | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2 | Dak Prescott | 30.0 | DAL | QB | 17.0 | 17.0 | 12-5-0 | 410.0 | 590.0 | 69.5 | 4516.0 | 36.0 | 6.1 | 9.0 | 1.5 | 222.0 | 51.5 | 92.0 | 7.7 | 8.19 | 11.0 | 265.6 | 105.9 | 72.7 | 39.0 | 255.0 | 6.20 | 6.77 | 7.28 | 2.0 | 3.0 |
| 1 | 4 | Brock Purdy | 24.0 | SFO | QB | 16.0 | 16.0 | 12-4-0 | 308.0 | 444.0 | 69.4 | 4280.0 | 31.0 | 7.0 | 11.0 | 2.5 | 192.0 | 54.7 | 76.0 | 9.6 | 9.92 | 13.9 | 267.5 | 113.0 | 72.8 | 28.0 | 153.0 | 5.93 | 8.74 | 9.01 | 0.0 | 0.0 |
| 2 | 0 | Tua Tagovailoa | 25.0 | MIA | QB | 17.0 | 17.0 | 11-6-0 | 388.0 | 560.0 | 69.3 | 4624.0 | 29.0 | 5.2 | 14.0 | 2.5 | 222.0 | 50.8 | 78.0 | 8.3 | 8.17 | 11.9 | 272.0 | 101.1 | 60.8 | 29.0 | 171.0 | 4.92 | 7.56 | 7.48 | 2.0 | 2.0 |
| 3 | 1 | Jared Goff | 29.0 | DET | QB | 17.0 | 17.0 | 12-5-0 | 407.0 | 605.0 | 67.3 | 4575.0 | 30.0 | 5.0 | 12.0 | 2.0 | 227.0 | 50.9 | 70.0 | 7.6 | 7.66 | 11.2 | 269.1 | 97.9 | 60.3 | 30.0 | 197.0 | 4.72 | 6.89 | 6.99 | 2.0 | 3.0 |
| 4 | 3 | Josh Allen | 27.0 | BUF | QB | 17.0 | 17.0 | 11-6-0 | 385.0 | 579.0 | 66.5 | 4306.0 | 29.0 | 5.0 | 18.0 | 3.1 | 199.0 | 50.7 | 81.0 | 7.4 | 7.04 | 11.2 | 253.3 | 92.2 | 69.6 | 24.0 | 152.0 | 3.98 | 6.89 | 6.51 | 2.0 | 4.0 |
Analyzing our newly cleaned dataset above.
# Searching for the min, max, and mean of different columns in our dataset.
df_clean_v1[['Att', 'Cmp', 'Yds', 'TD', 'Int', 'Cmp%', 'QBR']].agg(
{'Att': ['min', 'max', 'mean'],
'Cmp': ['min', 'max', 'mean'],
'Int': ['min', 'max', 'mean'],
'TD' : ['min', 'max', 'mean'],
'Cmp%': ['min', 'max', 'mean'],
'Yds': ['min', 'max', 'mean'],
'QBR': ['min', 'max', 'mean']
}
)
| Att | Cmp | Int | TD | Cmp% | Yds | QBR | |
|---|---|---|---|---|---|---|---|
| min | 20.0000 | 12.00000 | 0.0000 | 0.000000 | 47.400000 | 62.00000 | 1.200000 |
| max | 612.0000 | 410.00000 | 21.0000 | 36.000000 | 75.500000 | 4624.00000 | 89.900000 |
| mean | 276.1875 | 178.28125 | 6.4375 | 11.328125 | 62.971875 | 1946.53125 | 45.835938 |
Categorical statistical breakdowns. Gathering respective min, max, and mean(avg) values.
# Filtering for our newly cleaned dataset for the following :
# Completion Percentage (Cmp%) of 62 and above, a Quarterback Rating (QBR) of 55, and 75 attempts or more on the season.
df_clean_v1[(df_clean_v1['Cmp%'] >= 62)
& (df_clean_v1['QBR'] >= 55)
& (df_clean_v1['Att'] >= 75)
]
df_clean_v1.head()
| Player | Age | Team | Pos | G | GS | QBrec | Cmp | Att | Cmp% | Yds | TD | TD% | Int | Int% | 1D | Succ% | Lng | Y/A | AY/A | Y/C | Y/G | Rate | QBR | Sk | Yds.1 | Sk% | NY/A | ANY/A | 4QC | GWD | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Tua Tagovailoa | 25.0 | MIA | QB | 17.0 | 17.0 | 11-6-0 | 388.0 | 560.0 | 69.3 | 4624.0 | 29.0 | 5.2 | 14.0 | 2.5 | 222.0 | 50.8 | 78.0 | 8.3 | 8.17 | 11.9 | 272.0 | 101.1 | 60.8 | 29.0 | 171.0 | 4.92 | 7.56 | 7.48 | 2.0 | 2.0 |
| 1 | Jared Goff | 29.0 | DET | QB | 17.0 | 17.0 | 12-5-0 | 407.0 | 605.0 | 67.3 | 4575.0 | 30.0 | 5.0 | 12.0 | 2.0 | 227.0 | 50.9 | 70.0 | 7.6 | 7.66 | 11.2 | 269.1 | 97.9 | 60.3 | 30.0 | 197.0 | 4.72 | 6.89 | 6.99 | 2.0 | 3.0 |
| 2 | Dak Prescott | 30.0 | DAL | QB | 17.0 | 17.0 | 12-5-0 | 410.0 | 590.0 | 69.5 | 4516.0 | 36.0 | 6.1 | 9.0 | 1.5 | 222.0 | 51.5 | 92.0 | 7.7 | 8.19 | 11.0 | 265.6 | 105.9 | 72.7 | 39.0 | 255.0 | 6.20 | 6.77 | 7.28 | 2.0 | 3.0 |
| 3 | Josh Allen | 27.0 | BUF | QB | 17.0 | 17.0 | 11-6-0 | 385.0 | 579.0 | 66.5 | 4306.0 | 29.0 | 5.0 | 18.0 | 3.1 | 199.0 | 50.7 | 81.0 | 7.4 | 7.04 | 11.2 | 253.3 | 92.2 | 69.6 | 24.0 | 152.0 | 3.98 | 6.89 | 6.51 | 2.0 | 4.0 |
| 4 | Brock Purdy | 24.0 | SFO | QB | 16.0 | 16.0 | 12-4-0 | 308.0 | 444.0 | 69.4 | 4280.0 | 31.0 | 7.0 | 11.0 | 2.5 | 192.0 | 54.7 | 76.0 | 9.6 | 9.92 | 13.9 | 267.5 | 113.0 | 72.8 | 28.0 | 153.0 | 5.93 | 8.74 | 9.01 | 0.0 | 0.0 |
Searching for the data of QB's that pass around the range of the mean for key categories : Completion Percentage, QBR, and Attempt Total.
Parameters for filter were chosen based off of the mean and numbers close to the standard deviation in the selected categories, that would also return to us a reasonable dataset of QB's to work with.
Filtering below for the important statistical characteristics from each respective category.
# Presenting a DataFrame of specifically selected QB's for more analysis.
df_clean_v1[(df_clean_v1['Player'] == 'Dak Prescott')
| (df_clean_v1['Player'] == 'Lamar Jackson')
| (df_clean_v1['Player'] == 'Josh Allen')
]
| Player | Age | Team | Pos | G | GS | QBrec | Cmp | Att | Cmp% | Yds | TD | TD% | Int | Int% | 1D | Succ% | Lng | Y/A | AY/A | Y/C | Y/G | Rate | QBR | Sk | Yds.1 | Sk% | NY/A | ANY/A | 4QC | GWD | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2 | Dak Prescott | 30.0 | DAL | QB | 17.0 | 17.0 | 12-5-0 | 410.0 | 590.0 | 69.5 | 4516.0 | 36.0 | 6.1 | 9.0 | 1.5 | 222.0 | 51.5 | 92.0 | 7.7 | 8.19 | 11.0 | 265.6 | 105.9 | 72.7 | 39.0 | 255.0 | 6.20 | 6.77 | 7.28 | 2.0 | 3.0 |
| 3 | Josh Allen | 27.0 | BUF | QB | 17.0 | 17.0 | 11-6-0 | 385.0 | 579.0 | 66.5 | 4306.0 | 29.0 | 5.0 | 18.0 | 3.1 | 199.0 | 50.7 | 81.0 | 7.4 | 7.04 | 11.2 | 253.3 | 92.2 | 69.6 | 24.0 | 152.0 | 3.98 | 6.89 | 6.51 | 2.0 | 4.0 |
| 14 | Lamar Jackson | 26.0 | BAL | QB | 16.0 | 16.0 | 13-3-0 | 307.0 | 457.0 | 67.2 | 3678.0 | 24.0 | 5.3 | 7.0 | 1.5 | 167.0 | 48.2 | 80.0 | 8.0 | 8.41 | 12.0 | 229.9 | 102.7 | 64.7 | 37.0 | 218.0 | 7.49 | 7.00 | 7.34 | 1.0 | 0.0 |
Extra analysis on individually selected player statistics from our newly cleaned dataset.
# QB Completion % (Cmp%) Analysis variable, and value counts.
percentage_search = df_clean_v1['Cmp%'] > 64
percentage_search.value_counts()
True 32 False 32 Name: Cmp%, dtype: int64
Out of all the QB's analyzed, having a ~ 63% completion percentage ['Cmp%] appears to be the mean line between the respective candidates.
64 % is where we have an even number of QB's who took a pass attempt on the season above and below the statline.
# Add before each major section
print("="*80)
print("SECTION 2: EXPLORATORY DATA ANALYSIS")
print("="*80)
print(f"Note: Analysis based on {len(df_clean_v1)} qualified QBs with 75+ pass attempts")
print(f"Statistical confidence: High (large sample size)\n")
================================================================================ SECTION 2: EXPLORATORY DATA ANALYSIS ================================================================================ Note: Analysis based on 64 qualified QBs with 75+ pass attempts Statistical confidence: High (large sample size)
2. Exploratory Data Analysis¶
2.1 4QC & GWD¶
# A version containing some key 'clutch methods' we have uncovered during our analysis.
# Head is set to 32 to grasp most starting QB's data (32 respective NFL clubs).
# -- Main columns to analyze: '4QC', 'GWD' -- *
imp_3_letter_stats = df_clean_v1[['Player', 'Team', 'QBR', '4QC', 'GWD']].sort_values(by=['4QC', 'GWD'], ascending=False).head(32)
imp_3_letter_stats
| Player | Team | QBR | 4QC | GWD | |
|---|---|---|---|---|---|
| 15 | Geno Smith | SEA | 59.5 | 4.0 | 5.0 |
| 18 | Russell Wilson | DEN | 50.7 | 4.0 | 4.0 |
| 13 | Jalen Hurts | PHI | 60.1 | 3.0 | 4.0 |
| 27 | Kenny Pickett | PIT | 38.1 | 3.0 | 3.0 |
| 24 | Zach Wilson | NYJ | 30.6 | 3.0 | 2.0 |
| 3 | Josh Allen | BUF | 69.6 | 2.0 | 4.0 |
| 20 | Desmond Ridder | ATL | 40.1 | 2.0 | 4.0 |
| 1 | Jared Goff | DET | 60.3 | 2.0 | 3.0 |
| 2 | Dak Prescott | DAL | 72.7 | 2.0 | 3.0 |
| 6 | Jordan Love | GNB | 62.1 | 2.0 | 3.0 |
| 0 | Tua Tagovailoa | MIA | 60.8 | 2.0 | 2.0 |
| 11 | Sam Howell | WAS | 42.4 | 2.0 | 2.0 |
| 16 | Gardner Minshew II | IND | 59.6 | 2.0 | 2.0 |
| 19 | Bryce Young | CAR | 33.4 | 2.0 | 2.0 |
| 28 | Jake Browning | CIN | 60.1 | 2.0 | 2.0 |
| 30 | Kyler Murray | ARI | 47.2 | 2.0 | 2.0 |
| 36 | Jimmy Garoppolo | LVR | 33.9 | 2.0 | 2.0 |
| 45 | P.J. Walker | CLE | 18.6 | 2.0 | 2.0 |
| 7 | C.J. Stroud | HOU | 57.5 | 1.0 | 3.0 |
| 5 | Patrick Mahomes | KAN | 63.1 | 1.0 | 2.0 |
| 8 | Baker Mayfield | TAM | 54.3 | 1.0 | 2.0 |
| 9 | Trevor Lawrence | JAX | 56.1 | 1.0 | 2.0 |
| 10 | Matthew Stafford | LAR | 63.5 | 1.0 | 2.0 |
| 39 | Tommy DeVito | NYG | 23.7 | 1.0 | 2.0 |
| 17 | Justin Herbert | LAC | 64.1 | 1.0 | 1.0 |
| 21 | Justin Fields | CHI | 46.1 | 1.0 | 1.0 |
| 26 | Mac Jones | NWE | 36.7 | 1.0 | 1.0 |
| 29 | Will Levis | TEN | 33.2 | 1.0 | 1.0 |
| 31 | Joe Flacco | CLE | 48.3 | 1.0 | 1.0 |
| 32 | Ryan Tannehill | TEN | 35.1 | 1.0 | 1.0 |
| 38 | Deshaun Watson | CLE | 42.9 | 1.0 | 1.0 |
| 40 | Daniel Jones | NYG | 36.3 | 1.0 | 1.0 |
Formed a new table/variable showcasing the key "clutch" factors 4QC and GWD, emphasized from before.
We have our Player Names, Teams, QBR, 4QC, and GWD data.
Players are listed in order of those who have the highest to lowest amount of 4QC's, and GWD's for the season.
# Setting filters for the DataFrame.
elite_draft1 = imp_3_letter_stats.loc[(imp_3_letter_stats['GWD'] >= 2)
& (imp_3_letter_stats['4QC'] >= 2)
& (imp_3_letter_stats['QBR'] >= 59)
].sort_values(by=['4QC', 'GWD', 'QBR'], ascending=False).reset_index()
elite_draft1
| index | Player | Team | QBR | 4QC | GWD | |
|---|---|---|---|---|---|---|
| 0 | 15 | Geno Smith | SEA | 59.5 | 4.0 | 5.0 |
| 1 | 13 | Jalen Hurts | PHI | 60.1 | 3.0 | 4.0 |
| 2 | 3 | Josh Allen | BUF | 69.6 | 2.0 | 4.0 |
| 3 | 2 | Dak Prescott | DAL | 72.7 | 2.0 | 3.0 |
| 4 | 6 | Jordan Love | GNB | 62.1 | 2.0 | 3.0 |
| 5 | 1 | Jared Goff | DET | 60.3 | 2.0 | 3.0 |
| 6 | 0 | Tua Tagovailoa | MIA | 60.8 | 2.0 | 2.0 |
| 7 | 28 | Jake Browning | CIN | 60.1 | 2.0 | 2.0 |
| 8 | 16 | Gardner Minshew II | IND | 59.6 | 2.0 | 2.0 |
Above is a list of QB's who have atleast 2 Game-Winning Drives ('GWD'), atleast 2 Fourth Quarter Comebacks ('4QC'), and atleast an overall season QBR of 59.0.
Analysis on who performed best in a high number of close game situations. An initial analysis on some already defined "clutch" moments in the sport.
Visualizing Elite Quarterback Performance: QBR, Comebacks, and Game-Winning Drives¶
After identifying a select group of quarterbacks who led the league in 4th-Quarter Comebacks (4QC) and Game-Winning Drives (GWD) during the 2023 season, this section aims to visually compare their performance across three key metrics:
Total QBR: A holistic measure of a quarterback's overall effectiveness.
4th-Quarter Comebacks (4QC): The number of times a QB led their team to a win after trailing in the fourth quarter.
Game-Winning Drives (GWD): The number of times a QB led their team on an offensive drive that resulted in the winning score at the end of a game.
These visualizations allow for a clear, side-by-side comparison, helping us understand not only who is statistically "clutch," but also how that "clutch" ability relates to their overall performance rating (QBR).
# Elite Draft Data Visualization
sns.set_theme()
sns.set_style("darkgrid")
sns.barplot(
data=elite_draft1,
x='QBR', y='Player', hue='Player', palette='YlGnBu'
)
# Visualization of QBR Ratings of selected candidates
plt.title("QBR Ratings of NFL QB's with the Most GWD's and 4QC's in 2023")
plt.xlabel("QBR Rating")
plt.show()
sns.barplot(
data=elite_draft1,
x='4QC', y='Player', hue='Player', palette='magma_r'
)
# Visualization of 4QC Ratings of selected candidates
plt.title("4th-Quarter Comeback Amounts By NFL QB's in 2023")
plt.xlabel("# of Fourth-Quarter Comebacks")
plt.show()
sns.barplot(
data=elite_draft1,
x='GWD', y='Player', hue='Player', palette='magma_r'
)
# Visualization of GWC Ratings of selected candidates
plt.title("Game Winning Drive(s) Amount by NFL QB's in 2023")
plt.xlabel("# of Game Winning Drives")
plt.show()
Insights from the Visualizations:¶
QBR Ratings: The first chart ranks these quarterbacks by their Total QBR. This provides a baseline understanding of their overall efficiency and contribution to winning throughout the entire game, not just in final moments.
Comebacks and Game-Winning Drives: The subsequent bar charts visually emphasize the high-stakes nature of these metrics. They clearly display the raw counts of 4th-quarter comebacks and game-winning drives, allowing us to quickly identify the leaders in these crucial, narrative-defining categories.
Together, these plots offer a compelling visual narrative. They begin to form some sense of the "clutch" status of these quarterbacks by quantifying their late-game heroics and juxtapose it with a broader performance metric like QBR, providing a more complete picture of their value.
Right away we can see players like Geno Smith who find more success in respect to their peers these key "game-time" or "clutch" situations.
(Honorable Mentions: Josh Allen, Dak Prescott, Jalen Hurts)
2.2 Exploratory Data Analysis: Performance By Down¶
Play-by-Play¶
# Analyzing all of the columns in the play-by-play NFL Dataset from 2023.
pbp_columns = nfl.import_pbp_data([2023]).columns
list_pbp_columns = list(pbp_columns)
# Creating a list of the columns for better understanding of the data structure of our play-by-play dataset.
# print(list_pbp_columns)
pbp_df = nfl.import_pbp_data([2023])
2023 done. Downcasting floats. 2023 done. Downcasting floats.
Preparation for viewing our dataset.
# Regex Expression call
import re
# Availabilty to see all columns in the dataset
pd.set_option('display.max_colwidth', None)
# Making a variable for conditional values, in this case that is the specific Down a play has occured in our dataset.
conditions = (pbp_df['play_type'] == 'pass') & (pbp_df['down'].isin([1.0, 2.0, 3.0, 4.0]))
# Creating new copy grouped by our conditions input above.
pass_plays_df = pbp_df.loc[conditions].copy()
# Showcasing new DataFrame as a Series.
print(pass_plays_df.head())
play_id game_id old_game_id_x home_team away_team season_type \
3 77.0 2023_01_ARI_WAS 2023091007 WAS ARI REG
5 124.0 2023_01_ARI_WAS 2023091007 WAS ARI REG
6 147.0 2023_01_ARI_WAS 2023091007 WAS ARI REG
7 172.0 2023_01_ARI_WAS 2023091007 WAS ARI REG
8 197.0 2023_01_ARI_WAS 2023091007 WAS ARI REG
week posteam posteam_type defteam side_of_field yardline_100 game_date \
3 1 WAS home ARI WAS 72.0 2023-09-10
5 1 WAS home ARI WAS 64.0 2023-09-10
6 1 WAS home ARI WAS 64.0 2023-09-10
7 1 WAS home ARI WAS 52.0 2023-09-10
8 1 WAS home ARI WAS 51.0 2023-09-10
quarter_seconds_remaining half_seconds_remaining game_seconds_remaining \
3 870.0 1770.0 3570.0
5 796.0 1696.0 3496.0
6 792.0 1692.0 3492.0
7 754.0 1654.0 3454.0
8 716.0 1616.0 3416.0
game_half quarter_end drive sp qtr down goal_to_go time yrdln \
3 Half1 0.0 1.0 0.0 1.0 2.0 0 14:30 WAS 28
5 Half1 0.0 1.0 0.0 1.0 1.0 0 13:16 WAS 36
6 Half1 0.0 1.0 0.0 1.0 2.0 0 13:12 WAS 36
7 Half1 0.0 1.0 0.0 1.0 1.0 0 12:34 WAS 48
8 Half1 0.0 1.0 0.0 1.0 2.0 0 11:56 WAS 49
ydstogo ydsnet \
3 7.0 26.0
5 10.0 26.0
6 10.0 26.0
7 10.0 26.0
8 9.0 26.0
desc \
3 (14:30) (Shotgun) 14-S.Howell pass short right to 1-J.Dotson to WAS 34 for 6 yards (13-K.Clark, 10-J.Woods).
5 (13:16) (Shotgun) 14-S.Howell pass incomplete short middle to 82-L.Thomas.
6 (13:12) 14-S.Howell pass short middle to 1-J.Dotson to WAS 48 for 12 yards (13-K.Clark).
7 (12:34) (Shotgun) 14-S.Howell pass short left to 4-C.Samuel to WAS 49 for 1 yard (34-J.Thompson).
8 (11:56) (Shotgun) 14-S.Howell pass incomplete short left to 82-L.Thomas (22-K.Wallace).
play_type yards_gained shotgun no_huddle qb_dropback qb_kneel \
3 pass 6.0 1.0 0.0 1.0 0.0
5 pass 0.0 1.0 0.0 1.0 0.0
6 pass 12.0 0.0 0.0 1.0 0.0
7 pass 1.0 1.0 0.0 1.0 0.0
8 pass 0.0 1.0 0.0 1.0 0.0
qb_spike qb_scramble pass_length pass_location air_yards \
3 0.0 0.0 short right 6.0
5 0.0 0.0 short middle 10.0
6 0.0 0.0 short middle 12.0
7 0.0 0.0 short left -4.0
8 0.0 0.0 short left 3.0
yards_after_catch run_location run_gap field_goal_result kick_distance \
3 0.0 None None None NaN
5 NaN None None None NaN
6 0.0 None None None NaN
7 5.0 None None None NaN
8 NaN None None None NaN
extra_point_result two_point_conv_result home_timeouts_remaining \
3 None None 3.0
5 None None 3.0
6 None None 3.0
7 None None 3.0
8 None None 3.0
away_timeouts_remaining timeout timeout_team td_team td_player_name \
3 3.0 0.0 None None None
5 3.0 0.0 None None None
6 3.0 0.0 None None None
7 3.0 0.0 None None None
8 3.0 0.0 None None None
td_player_id posteam_timeouts_remaining defteam_timeouts_remaining \
3 None 3.0 3.0
5 None 3.0 3.0
6 None 3.0 3.0
7 None 3.0 3.0
8 None 3.0 3.0
total_home_score total_away_score posteam_score defteam_score \
3 0.0 0.0 0.0 0.0
5 0.0 0.0 0.0 0.0
6 0.0 0.0 0.0 0.0
7 0.0 0.0 0.0 0.0
8 0.0 0.0 0.0 0.0
score_differential posteam_score_post defteam_score_post \
3 0.0 0.0 0.0
5 0.0 0.0 0.0
6 0.0 0.0 0.0
7 0.0 0.0 0.0
8 0.0 0.0 0.0
score_differential_post no_score_prob opp_fg_prob opp_safety_prob \
3 0.0 0.004854 0.152473 0.002145
5 0.0 0.005168 0.114385 0.001770
6 0.0 0.005341 0.129647 0.002001
7 0.0 0.004763 0.090331 0.001307
8 0.0 0.005918 0.104577 0.001312
opp_td_prob fg_prob safety_prob td_prob extra_point_prob \
3 0.246986 0.202101 0.003557 0.387884 0.0
5 0.181144 0.233607 0.004500 0.459426 0.0
6 0.210970 0.220481 0.004615 0.426946 0.0
7 0.143480 0.264350 0.004551 0.491218 0.0
8 0.169272 0.264175 0.005445 0.449301 0.0
two_point_conversion_prob ep epa total_home_epa \
3 0.0 1.137994 0.703308 0.367204
5 0.0 2.311101 -0.521544 0.315459
6 0.0 1.789557 1.173154 1.488614
7 0.0 2.962712 -0.515451 0.973162
8 0.0 2.447260 -0.872005 0.101157
total_away_epa total_home_rush_epa total_away_rush_epa \
3 -0.367204 -0.336103 0.336103
5 -0.315459 0.133696 -0.133696
6 -1.488614 0.133696 -0.133696
7 -0.973162 0.133696 -0.133696
8 -0.101157 0.133696 -0.133696
total_home_pass_epa total_away_pass_epa air_epa yac_epa comp_air_epa \
3 0.703308 -0.703308 0.703308 0.000000 0.703308
5 0.181764 -0.181764 0.621026 -1.142570 0.000000
6 1.354918 -1.354918 1.173154 0.000000 1.173154
7 0.839467 -0.839467 -1.110129 0.594677 -1.110129
8 -0.032539 0.032539 -0.659084 -0.212921 0.000000
comp_yac_epa total_home_comp_air_epa total_away_comp_air_epa \
3 0.000000 0.703308 -0.703308
5 0.000000 0.703308 -0.703308
6 0.000000 1.876462 -1.876462
7 0.594677 0.766333 -0.766333
8 0.000000 0.766333 -0.766333
total_home_comp_yac_epa total_away_comp_yac_epa total_home_raw_air_epa \
3 0.000000 0.000000 0.703308
5 0.000000 0.000000 1.324334
6 0.000000 0.000000 2.497488
7 0.594677 -0.594677 1.387359
8 0.594677 -0.594677 0.728275
total_away_raw_air_epa total_home_raw_yac_epa total_away_raw_yac_epa \
3 -0.703308 0.000000 0.000000
5 -1.324334 -1.142570 1.142570
6 -2.497488 -1.142570 1.142570
7 -1.387359 -0.547892 0.547892
8 -0.728275 -0.760814 0.760814
wp def_wp home_wp away_wp wpa vegas_wpa \
3 0.539621 0.460379 0.539621 0.460379 0.016367 0.011024
5 0.572573 0.427427 0.572573 0.427427 -0.018037 -0.006076
6 0.554537 0.445463 0.554537 0.445463 0.029910 -0.006797
7 0.584447 0.415553 0.584447 0.415553 -0.017915 -0.039551
8 0.566532 0.433468 0.566532 0.433468 -0.035688 0.003150
vegas_home_wpa home_wp_post away_wp_post vegas_wp vegas_home_wp \
3 0.011024 0.555987 0.444013 0.717126 0.717126
5 -0.006076 0.554537 0.445463 0.740649 0.740649
6 -0.006797 0.584447 0.415553 0.734572 0.734572
7 -0.039551 0.566532 0.433468 0.727775 0.727775
8 0.003150 0.530844 0.469156 0.688224 0.688224
total_home_rush_wpa total_away_rush_wpa total_home_pass_wpa \
3 -0.006641 0.006641 0.016367
5 0.009945 -0.009945 -0.001670
6 0.009945 -0.009945 0.028240
7 0.009945 -0.009945 0.010325
8 0.009945 -0.009945 -0.025363
total_away_pass_wpa air_wpa yac_wpa comp_air_wpa comp_yac_wpa \
3 -0.016367 0.016367 0.000000 0.016367 0.000000
5 0.001670 0.000000 -0.018037 0.000000 0.000000
6 -0.028240 0.029910 0.000000 0.029910 0.000000
7 -0.010325 0.000000 -0.017915 0.000000 -0.017915
8 0.025363 0.000000 -0.035688 0.000000 0.000000
total_home_comp_air_wpa total_away_comp_air_wpa total_home_comp_yac_wpa \
3 0.016367 -0.016367 0.000000
5 0.016367 -0.016367 0.000000
6 0.046277 -0.046277 0.000000
7 0.046277 -0.046277 -0.017915
8 0.046277 -0.046277 -0.017915
total_away_comp_yac_wpa total_home_raw_air_wpa total_away_raw_air_wpa \
3 0.000000 0.016367 -0.016367
5 0.000000 0.016367 -0.016367
6 0.000000 0.046277 -0.046277
7 0.017915 0.046277 -0.046277
8 0.017915 0.046277 -0.046277
total_home_raw_yac_wpa total_away_raw_yac_wpa punt_blocked \
3 0.000000 0.000000 0.0
5 -0.018037 0.018037 0.0
6 -0.018037 0.018037 0.0
7 -0.035952 0.035952 0.0
8 -0.071639 0.071639 0.0
first_down_rush first_down_pass first_down_penalty third_down_converted \
3 0.0 0.0 0.0 0.0
5 0.0 0.0 0.0 0.0
6 0.0 1.0 0.0 0.0
7 0.0 0.0 0.0 0.0
8 0.0 0.0 0.0 0.0
third_down_failed fourth_down_converted fourth_down_failed \
3 0.0 0.0 0.0
5 0.0 0.0 0.0
6 0.0 0.0 0.0
7 0.0 0.0 0.0
8 0.0 0.0 0.0
incomplete_pass touchback interception punt_inside_twenty \
3 0.0 0.0 0.0 0.0
5 1.0 0.0 0.0 0.0
6 0.0 0.0 0.0 0.0
7 0.0 0.0 0.0 0.0
8 1.0 0.0 0.0 0.0
punt_in_endzone punt_out_of_bounds punt_downed punt_fair_catch \
3 0.0 0.0 0.0 0.0
5 0.0 0.0 0.0 0.0
6 0.0 0.0 0.0 0.0
7 0.0 0.0 0.0 0.0
8 0.0 0.0 0.0 0.0
kickoff_inside_twenty kickoff_in_endzone kickoff_out_of_bounds \
3 0.0 0.0 0.0
5 0.0 0.0 0.0
6 0.0 0.0 0.0
7 0.0 0.0 0.0
8 0.0 0.0 0.0
kickoff_downed kickoff_fair_catch fumble_forced fumble_not_forced \
3 0.0 0.0 0.0 0.0
5 0.0 0.0 0.0 0.0
6 0.0 0.0 0.0 0.0
7 0.0 0.0 0.0 0.0
8 0.0 0.0 0.0 0.0
fumble_out_of_bounds solo_tackle safety penalty tackled_for_loss \
3 0.0 0.0 0.0 0.0 0.0
5 0.0 0.0 0.0 0.0 0.0
6 0.0 1.0 0.0 0.0 0.0
7 0.0 1.0 0.0 0.0 0.0
8 0.0 0.0 0.0 0.0 0.0
fumble_lost own_kickoff_recovery own_kickoff_recovery_td qb_hit \
3 0.0 0.0 0.0 0.0
5 0.0 0.0 0.0 0.0
6 0.0 0.0 0.0 0.0
7 0.0 0.0 0.0 0.0
8 0.0 0.0 0.0 0.0
rush_attempt pass_attempt sack touchdown pass_touchdown \
3 0.0 1.0 0.0 0.0 0.0
5 0.0 1.0 0.0 0.0 0.0
6 0.0 1.0 0.0 0.0 0.0
7 0.0 1.0 0.0 0.0 0.0
8 0.0 1.0 0.0 0.0 0.0
rush_touchdown return_touchdown extra_point_attempt two_point_attempt \
3 0.0 0.0 0.0 0.0
5 0.0 0.0 0.0 0.0
6 0.0 0.0 0.0 0.0
7 0.0 0.0 0.0 0.0
8 0.0 0.0 0.0 0.0
field_goal_attempt kickoff_attempt punt_attempt fumble complete_pass \
3 0.0 0.0 0.0 0.0 1.0
5 0.0 0.0 0.0 0.0 0.0
6 0.0 0.0 0.0 0.0 1.0
7 0.0 0.0 0.0 0.0 1.0
8 0.0 0.0 0.0 0.0 0.0
assist_tackle lateral_reception lateral_rush lateral_return \
3 1.0 0.0 0.0 0.0
5 0.0 0.0 0.0 0.0
6 0.0 0.0 0.0 0.0
7 0.0 0.0 0.0 0.0
8 0.0 0.0 0.0 0.0
lateral_recovery passer_player_id passer_player_name passing_yards \
3 0.0 00-0037077 S.Howell 6.0
5 0.0 00-0037077 S.Howell NaN
6 0.0 00-0037077 S.Howell 12.0
7 0.0 00-0037077 S.Howell 1.0
8 0.0 00-0037077 S.Howell NaN
receiver_player_id receiver_player_name receiving_yards rusher_player_id \
3 00-0037741 J.Dotson 6.0 None
5 00-0031260 L.Thomas NaN None
6 00-0037741 J.Dotson 12.0 None
7 00-0033282 C.Samuel 1.0 None
8 00-0031260 L.Thomas NaN None
rusher_player_name rushing_yards lateral_receiver_player_id \
3 None NaN None
5 None NaN None
6 None NaN None
7 None NaN None
8 None NaN None
lateral_receiver_player_name lateral_receiving_yards \
3 None NaN
5 None NaN
6 None NaN
7 None NaN
8 None NaN
lateral_rusher_player_id lateral_rusher_player_name lateral_rushing_yards \
3 None None NaN
5 None None NaN
6 None None NaN
7 None None NaN
8 None None NaN
lateral_sack_player_id lateral_sack_player_name interception_player_id \
3 None None None
5 None None None
6 None None None
7 None None None
8 None None None
interception_player_name lateral_interception_player_id \
3 None None
5 None None
6 None None
7 None None
8 None None
lateral_interception_player_name punt_returner_player_id \
3 None None
5 None None
6 None None
7 None None
8 None None
punt_returner_player_name lateral_punt_returner_player_id \
3 None None
5 None None
6 None None
7 None None
8 None None
lateral_punt_returner_player_name kickoff_returner_player_name \
3 None None
5 None None
6 None None
7 None None
8 None None
kickoff_returner_player_id lateral_kickoff_returner_player_id \
3 None None
5 None None
6 None None
7 None None
8 None None
lateral_kickoff_returner_player_name punter_player_id punter_player_name \
3 None None None
5 None None None
6 None None None
7 None None None
8 None None None
kicker_player_name kicker_player_id own_kickoff_recovery_player_id \
3 None None None
5 None None None
6 None None None
7 None None None
8 None None None
own_kickoff_recovery_player_name blocked_player_id blocked_player_name \
3 None None None
5 None None None
6 None None None
7 None None None
8 None None None
tackle_for_loss_1_player_id tackle_for_loss_1_player_name \
3 None None
5 None None
6 None None
7 None None
8 None None
tackle_for_loss_2_player_id tackle_for_loss_2_player_name \
3 None None
5 None None
6 None None
7 None None
8 None None
qb_hit_1_player_id qb_hit_1_player_name qb_hit_2_player_id \
3 None None None
5 None None None
6 None None None
7 None None None
8 None None None
qb_hit_2_player_name forced_fumble_player_1_team \
3 None None
5 None None
6 None None
7 None None
8 None None
forced_fumble_player_1_player_id forced_fumble_player_1_player_name \
3 None None
5 None None
6 None None
7 None None
8 None None
forced_fumble_player_2_team forced_fumble_player_2_player_id \
3 None None
5 None None
6 None None
7 None None
8 None None
forced_fumble_player_2_player_name solo_tackle_1_team solo_tackle_2_team \
3 None None None
5 None None None
6 None ARI None
7 None ARI None
8 None None None
solo_tackle_1_player_id solo_tackle_2_player_id solo_tackle_1_player_name \
3 None None None
5 None None None
6 00-0038984 None K.Clark
7 00-0035705 None J.Thompson
8 None None None
solo_tackle_2_player_name assist_tackle_1_player_id \
3 None 00-0034801
5 None None
6 None None
7 None None
8 None None
assist_tackle_1_player_name assist_tackle_1_team assist_tackle_2_player_id \
3 J.Woods ARI None
5 None None None
6 None None None
7 None None None
8 None None None
assist_tackle_2_player_name assist_tackle_2_team assist_tackle_3_player_id \
3 None None None
5 None None None
6 None None None
7 None None None
8 None None None
assist_tackle_3_player_name assist_tackle_3_team assist_tackle_4_player_id \
3 None None None
5 None None None
6 None None None
7 None None None
8 None None None
assist_tackle_4_player_name assist_tackle_4_team tackle_with_assist \
3 None None 1.0
5 None None 0.0
6 None None 0.0
7 None None 0.0
8 None None 0.0
tackle_with_assist_1_player_id tackle_with_assist_1_player_name \
3 00-0038984 K.Clark
5 None None
6 None None
7 None None
8 None None
tackle_with_assist_1_team tackle_with_assist_2_player_id \
3 ARI None
5 None None
6 None None
7 None None
8 None None
tackle_with_assist_2_player_name tackle_with_assist_2_team \
3 None None
5 None None
6 None None
7 None None
8 None None
pass_defense_1_player_id pass_defense_1_player_name \
3 None None
5 None None
6 None None
7 None None
8 00-0036395 K.Wallace
pass_defense_2_player_id pass_defense_2_player_name fumbled_1_team \
3 None None None
5 None None None
6 None None None
7 None None None
8 None None None
fumbled_1_player_id fumbled_1_player_name fumbled_2_player_id \
3 None None None
5 None None None
6 None None None
7 None None None
8 None None None
fumbled_2_player_name fumbled_2_team fumble_recovery_1_team \
3 None None None
5 None None None
6 None None None
7 None None None
8 None None None
fumble_recovery_1_yards fumble_recovery_1_player_id \
3 NaN None
5 NaN None
6 NaN None
7 NaN None
8 NaN None
fumble_recovery_1_player_name fumble_recovery_2_team \
3 None None
5 None None
6 None None
7 None None
8 None None
fumble_recovery_2_yards fumble_recovery_2_player_id \
3 NaN None
5 NaN None
6 NaN None
7 NaN None
8 NaN None
fumble_recovery_2_player_name sack_player_id sack_player_name \
3 None None None
5 None None None
6 None None None
7 None None None
8 None None None
half_sack_1_player_id half_sack_1_player_name half_sack_2_player_id \
3 None None None
5 None None None
6 None None None
7 None None None
8 None None None
half_sack_2_player_name return_team return_yards penalty_team \
3 None None 0.0 None
5 None None 0.0 None
6 None None 0.0 None
7 None None 0.0 None
8 None None 0.0 None
penalty_player_id penalty_player_name penalty_yards replay_or_challenge \
3 None None NaN 0.0
5 None None NaN 0.0
6 None None NaN 0.0
7 None None NaN 0.0
8 None None NaN 0.0
replay_or_challenge_result penalty_type defensive_two_point_attempt \
3 None None 0.0
5 None None 0.0
6 None None 0.0
7 None None 0.0
8 None None 0.0
defensive_two_point_conv defensive_extra_point_attempt \
3 0.0 0.0
5 0.0 0.0
6 0.0 0.0
7 0.0 0.0
8 0.0 0.0
defensive_extra_point_conv safety_player_name safety_player_id season \
3 0.0 None None 2023
5 0.0 None None 2023
6 0.0 None None 2023
7 0.0 None None 2023
8 0.0 None None 2023
cp cpoe series series_success series_result order_sequence \
3 0.747638 25.236183 1.0 1.0 First down 77.0
5 0.707635 -70.763489 2.0 1.0 First down 124.0
6 0.722689 27.731085 2.0 1.0 First down 147.0
7 0.879122 12.087756 3.0 0.0 Punt 172.0
8 0.779528 -77.952835 3.0 0.0 Punt 197.0
start_time time_of_day stadium \
3 9/10/23, 13:02:43 2023-09-10T17:03:52.567Z Commanders Field
5 9/10/23, 13:02:43 2023-09-10T17:05:05.807Z Commanders Field
6 9/10/23, 13:02:43 2023-09-10T17:05:41.710Z Commanders Field
7 9/10/23, 13:02:43 2023-09-10T17:06:20.083Z Commanders Field
8 9/10/23, 13:02:43 2023-09-10T17:06:58.453Z Commanders Field
weather \
3 Cloudy Temp: 76° F, Humidity: 84%, Wind: S 2 mph
5 Cloudy Temp: 76° F, Humidity: 84%, Wind: S 2 mph
6 Cloudy Temp: 76° F, Humidity: 84%, Wind: S 2 mph
7 Cloudy Temp: 76° F, Humidity: 84%, Wind: S 2 mph
8 Cloudy Temp: 76° F, Humidity: 84%, Wind: S 2 mph
nfl_api_id play_clock play_deleted \
3 b07c705e-f053-11ed-b4a7-bab79e4492fa 0 0.0
5 b07c705e-f053-11ed-b4a7-bab79e4492fa 0 0.0
6 b07c705e-f053-11ed-b4a7-bab79e4492fa 0 0.0
7 b07c705e-f053-11ed-b4a7-bab79e4492fa 0 0.0
8 b07c705e-f053-11ed-b4a7-bab79e4492fa 0 0.0
play_type_nfl special_teams_play st_play_type end_clock_time \
3 PASS 0.0 None 2023-09-10T17:03:56.907Z
5 PASS 0.0 None 2023-09-10T17:05:10.047Z
6 PASS 0.0 None 2023-09-10T17:05:46.983Z
7 PASS 0.0 None 2023-09-10T17:06:25.487Z
8 PASS 0.0 None 2023-09-10T17:07:02.757Z
end_yard_line fixed_drive fixed_drive_result drive_real_start_time \
3 None 1.0 Punt 2023-09-10T17:02:43.600Z
5 None 1.0 Punt 2023-09-10T17:02:43.600Z
6 None 1.0 Punt 2023-09-10T17:02:43.600Z
7 None 1.0 Punt 2023-09-10T17:02:43.600Z
8 None 1.0 Punt 2023-09-10T17:02:43.600Z
drive_play_count drive_time_of_possession drive_first_downs \
3 8.0 4:01 2.0
5 8.0 4:01 2.0
6 8.0 4:01 2.0
7 8.0 4:01 2.0
8 8.0 4:01 2.0
drive_inside20 drive_ended_with_score drive_quarter_start \
3 0.0 0.0 1.0
5 0.0 0.0 1.0
6 0.0 0.0 1.0
7 0.0 0.0 1.0
8 0.0 0.0 1.0
drive_quarter_end drive_yards_penalized drive_start_transition \
3 1.0 0.0 KICKOFF
5 1.0 0.0 KICKOFF
6 1.0 0.0 KICKOFF
7 1.0 0.0 KICKOFF
8 1.0 0.0 KICKOFF
drive_end_transition drive_game_clock_start drive_game_clock_end \
3 PUNT 15:00 10:59
5 PUNT 15:00 10:59
6 PUNT 15:00 10:59
7 PUNT 15:00 10:59
8 PUNT 15:00 10:59
drive_start_yard_line drive_end_yard_line drive_play_id_started \
3 WAS 25 ARI 49 39.0
5 WAS 25 ARI 49 39.0
6 WAS 25 ARI 49 39.0
7 WAS 25 ARI 49 39.0
8 WAS 25 ARI 49 39.0
drive_play_id_ended away_score home_score location result total \
3 245.0 16 20 Home 4 36
5 245.0 16 20 Home 4 36
6 245.0 16 20 Home 4 36
7 245.0 16 20 Home 4 36
8 245.0 16 20 Home 4 36
spread_line total_line div_game roof surface temp wind \
3 7.0 38.0 0 outdoors NaN NaN
5 7.0 38.0 0 outdoors NaN NaN
6 7.0 38.0 0 outdoors NaN NaN
7 7.0 38.0 0 outdoors NaN NaN
8 7.0 38.0 0 outdoors NaN NaN
home_coach away_coach stadium_id game_stadium aborted_play success \
3 Ron Rivera Jonathan Gannon WAS00 FedExField 0.0 1.0
5 Ron Rivera Jonathan Gannon WAS00 FedExField 0.0 0.0
6 Ron Rivera Jonathan Gannon WAS00 FedExField 0.0 1.0
7 Ron Rivera Jonathan Gannon WAS00 FedExField 0.0 0.0
8 Ron Rivera Jonathan Gannon WAS00 FedExField 0.0 0.0
passer passer_jersey_number rusher rusher_jersey_number receiver \
3 S.Howell 14.0 None NaN J.Dotson
5 S.Howell 14.0 None NaN L.Thomas
6 S.Howell 14.0 None NaN J.Dotson
7 S.Howell 14.0 None NaN C.Samuel
8 S.Howell 14.0 None NaN L.Thomas
receiver_jersey_number pass rush first_down special play passer_id \
3 1.0 1.0 0.0 0.0 0.0 1.0 00-0037077
5 82.0 1.0 0.0 0.0 0.0 1.0 00-0037077
6 1.0 1.0 0.0 1.0 0.0 1.0 00-0037077
7 4.0 1.0 0.0 0.0 0.0 1.0 00-0037077
8 82.0 1.0 0.0 0.0 0.0 1.0 00-0037077
rusher_id receiver_id name jersey_number id \
3 None 00-0037741 S.Howell 14.0 00-0037077
5 None 00-0031260 S.Howell 14.0 00-0037077
6 None 00-0037741 S.Howell 14.0 00-0037077
7 None 00-0033282 S.Howell 14.0 00-0037077
8 None 00-0031260 S.Howell 14.0 00-0037077
fantasy_player_name fantasy_player_id fantasy fantasy_id out_of_bounds \
3 J.Dotson 00-0037741 J.Dotson 00-0037741 0.0
5 L.Thomas 00-0031260 L.Thomas 00-0031260 0.0
6 J.Dotson 00-0037741 J.Dotson 00-0037741 0.0
7 C.Samuel 00-0033282 C.Samuel 00-0033282 0.0
8 L.Thomas 00-0031260 L.Thomas 00-0031260 0.0
home_opening_kickoff qb_epa xyac_epa xyac_mean_yardage \
3 1.0 0.703308 0.340652 3.328642
5 1.0 -0.521544 0.234473 4.626063
6 1.0 1.173154 0.304367 4.480009
7 1.0 -0.515451 1.168102 10.487875
8 1.0 -0.872005 0.908345 4.576524
xyac_median_yardage xyac_success xyac_fd xpass pass_oe \
3 1.0 0.996628 0.583928 0.661106 33.889408
5 3.0 0.999221 0.979605 0.495536 50.446377
6 2.0 1.000000 0.997461 0.563005 43.699486
7 9.0 0.472213 0.253578 0.484261 51.573910
8 3.0 0.441858 0.276978 0.709240 29.076004
nflverse_game_id old_game_id_y possession_team offense_formation \
3 2023_01_ARI_WAS 2023091007 WAS SHOTGUN
5 2023_01_ARI_WAS 2023091007 WAS SHOTGUN
6 2023_01_ARI_WAS 2023091007 WAS SINGLEBACK
7 2023_01_ARI_WAS 2023091007 WAS SHOTGUN
8 2023_01_ARI_WAS 2023091007 WAS SHOTGUN
offense_personnel defenders_in_box defense_personnel \
3 1 RB, 1 TE, 3 WR 6.0 2 DL, 4 LB, 5 DB
5 1 RB, 1 TE, 3 WR 6.0 3 DL, 3 LB, 5 DB
6 1 RB, 1 TE, 3 WR 6.0 3 DL, 3 LB, 5 DB
7 1 RB, 1 TE, 3 WR 6.0 3 DL, 3 LB, 5 DB
8 1 RB, 1 TE, 3 WR 6.0 2 DL, 4 LB, 5 DB
number_of_pass_rushers \
3 4.0
5 4.0
6 4.0
7 5.0
8 4.0
players_on_play \
3 49410;54563;41475;52516;47812;46629;53445;41349;53480;46188;56045;44848;54609;54481;47859;44852;52535;46968;54552;48473;53565;45695
5 46657;49410;41475;54563;52516;47812;46629;53445;41349;53480;52522;46188;56045;44848;54609;54481;47859;44852;52535;48473;53565;45695
6 46657;49410;54563;41475;52516;47812;53445;46629;41349;53511;53480;52522;46188;56045;44848;54609;54481;47859;52535;48473;53565;45695
7 46657;49410;54563;41475;52516;46629;53445;41349;53480;46188;56045;48462;44848;54609;54481;47859;44852;52535;48473;44955;53565;45695
8 54721;49410;41475;52516;46629;41349;53480;46188;56045;48462;44848;54609;54481;47859;44852;52535;46968;54552;52474;44955;53565;45695
offense_players \
3 00-0037746;00-0031095;00-0036334;00-0034445;00-0031260;00-0036618;00-0037077;00-0037741;00-0035659;00-0033282;00-0033831
5 00-0031095;00-0037746;00-0036334;00-0034445;00-0031260;00-0036618;00-0037077;00-0037741;00-0035659;00-0033282;00-0033831
6 00-0037746;00-0031095;00-0036334;00-0034445;00-0031260;00-0036626;00-0036618;00-0037077;00-0037741;00-0035659;00-0033831
7 00-0037746;00-0031095;00-0036334;00-0034445;00-0031260;00-0036618;00-0037077;00-0037741;00-0035659;00-0033282;00-0033831
8 00-0031095;00-0036334;00-0034445;00-0031260;00-0036618;00-0037077;00-0037741;00-0035659;00-0033282;00-0036328;00-0033831
defense_players \
3 00-0035705;00-0035636;00-0036933;00-0034375;00-0038984;00-0033890;00-0036395;00-0034801;00-0037815;00-0035343;00-0036884
5 00-0034473;00-0035705;00-0035636;00-0036933;00-0036371;00-0034375;00-0038984;00-0033890;00-0036395;00-0035343;00-0036884
6 00-0034473;00-0035705;00-0035636;00-0036933;00-0036371;00-0034375;00-0038984;00-0033890;00-0036395;00-0035343;00-0036884
7 00-0034473;00-0035705;00-0036933;00-0034375;00-0038984;00-0035334;00-0033890;00-0036395;00-0035343;00-0033563;00-0036884
8 00-0037330;00-0035705;00-0034375;00-0038984;00-0035334;00-0033890;00-0036395;00-0034801;00-0037815;00-0033563;00-0036884
n_offense n_defense ngs_air_yards time_to_throw was_pressure route \
3 11.0 11.0 4.53 2.169 0.0 HITCH
5 11.0 11.0 9.79 2.736 0.0 IN
6 11.0 11.0 12.99 3.971 1.0 HITCH
7 11.0 11.0 -3.57 1.517 0.0 SCREEN
8 11.0 11.0 2.31 2.436 0.0 HITCH
defense_man_zone_type defense_coverage_type
3 ZONE_COVERAGE COVER_3
5 ZONE_COVERAGE COVER_4
6 ZONE_COVERAGE COVER_4
7 ZONE_COVERAGE COVER_4
8 MAN_COVERAGE COVER_1
Above you will see an example of data in our newly loaded play-by-play dataset.
Next, we will extract the QB names and important passing play data from our DataFrame.
# Gathering the DataFrame shape
print("Shape of the filtered DataFrame (rows, columns):")
print(pass_plays_df.shape)
# Gathering the DataFrame description
print("\nFirst 5 rows of the 'desc' column for pass plays:")
print(pass_plays_df['desc'].head().to_string())
# Creating a new column using regex index to filter our description column, and extract the proper data.
pass_plays_df['passer_name'] = pass_plays_df['desc'].str.extract(r'([A-Z]\.\w+)\s+(?:pass|scrambles)', flags=re.IGNORECASE)
# Checking our QB extraction success rate.
print("\nNumber of successful QB name extractions:")
print(pass_plays_df['passer_name'].notna().sum())
# Verification for our name extraction process.
print("\nExamples of extracted QB names:")
print(pass_plays_df['passer_name'].dropna().unique()[:20])
Shape of the filtered DataFrame (rows, columns): (20644, 391) First 5 rows of the 'desc' column for pass plays: 3 (14:30) (Shotgun) 14-S.Howell pass short right to 1-J.Dotson to WAS 34 for 6 yards (13-K.Clark, 10-J.Woods). 5 (13:16) (Shotgun) 14-S.Howell pass incomplete short middle to 82-L.Thomas. 6 (13:12) 14-S.Howell pass short middle to 1-J.Dotson to WAS 48 for 12 yards (13-K.Clark). 7 (12:34) (Shotgun) 14-S.Howell pass short left to 4-C.Samuel to WAS 49 for 1 yard (34-J.Thompson). 8 (11:56) (Shotgun) 14-S.Howell pass incomplete short left to 82-L.Thomas (22-K.Wallace). Number of successful QB name extractions: 18736 Examples of extracted QB names: ['S.Howell' 'J.Dobbs' 'J.Allen' 'A.Rodgers' 'Z.Wilson' 'D.Ridder' 'B.Young' 'J.Burrow' 'D.Watson' 'J.Browning' 'D.Jones' 'D.Prescott' 'C.Rush' 'T.Taylor' 'J.Goff' 'P.Mahomes' 'J.Fields' 'J.Love' 'L.Jackson' 'C.Stroud']
Above you have our preview of the successfully extracted data from our DataFrame.
# Dropping rows where a QB name wasn't extracted
pass_plays_df.dropna(subset=['passer_name'], inplace=True)
# Calculating attempts per down for every QB
down_attempts = pass_plays_df.groupby(['passer_name', 'down']).size().reset_index(name='attempts')
# Creating a pivot table to showcase & store the attempts per down
qb_down_summary_table = down_attempts.pivot_table(
index='passer_name',
columns='down',
values='attempts',
fill_value=0
)
# Renaming the columns in our newly created pivot table to match the appropriate downs we are filtering for
qb_down_summary_table = qb_down_summary_table.rename(columns={
1.0: '1st_Down_Attempts',
2.0: '2nd_Down_Attempts',
3.0: '3rd_Down_Attempts',
4.0: '4th_Down_Attempts'
})
# Displaying our findings
print("QB Pass Attempts on 1st, 2nd, 3rd, and 4th Down:")
qb_down_summary_table.head()
QB Pass Attempts on 1st, 2nd, 3rd, and 4th Down:
| down | 1st_Down_Attempts | 2nd_Down_Attempts | 3rd_Down_Attempts | 4th_Down_Attempts |
|---|---|---|---|---|
| passer_name | ||||
| A.Dalton | 21 | 19 | 18 | 0 |
| A.McCarron | 0 | 2 | 3 | 0 |
| A.Richardson | 24 | 33 | 24 | 3 |
| A.Rodgers | 1 | 0 | 0 | 0 |
| B.Gabbert | 9 | 11 | 15 | 0 |
Our QB Analysis Table showcasing passing attempts from the QB's by down.
# Filtering for description basis
pass_plays_df[['desc', 'yards_gained', 'td_team']].tail()
| desc | yards_gained | td_team | |
|---|---|---|---|
| 49656 | (4:10) (Shotgun) 15-P.Mahomes pass short right to 4-R.Rice pushed ob at SF 41 for 13 yards (7-C.Ward). | 13.0 | None |
| 49657 | (3:33) 15-P.Mahomes pass short right to 10-I.Pacheco to SF 37 for 4 yards (48-O.Burks) [97-N.Bosa]. | 4.0 | None |
| 49659 | (2:48) (Shotgun) 15-P.Mahomes pass short right to 10-I.Pacheco to SF 32 for 5 yards (54-F.Warner). | 5.0 | None |
| 49662 | (:50) (Shotgun) 15-P.Mahomes pass short middle to 87-T.Kelce to SF 3 for 7 yards (27-J.Brown; 97-N.Bosa). | 7.0 | None |
| 49663 | (:06) (Shotgun) 15-P.Mahomes pass short right to 12-M.Hardman for 3 yards, TOUCHDOWN. | 3.0 | KC |
Searching for necessary data to determine if QB pass was complete, incomplete, a TD or not, and the yards gained on the play.
# Engineering new features below.
pass_plays_df['is_completion'] = ~pass_plays_df['desc'].str.contains('incomplete', na=False, case=False)
pass_plays_df['is_interception'] = pass_plays_df['desc'].str.contains('INTERCEPTED', na=False, case=False)
pass_plays_df['is_touchdown'] = pass_plays_df['desc'].str.contains('TOUCHDOWN', na=False, case=False)
pass_plays_df['is_completion'] = np.where(pass_plays_df['is_interception'], False, pass_plays_df['is_completion'])
# Checking for newly engineered features/ columns.
pass_plays_df[['is_completion', 'is_interception', 'is_touchdown', 'desc', 'yards_gained', 'td_team']].tail(10)
| is_completion | is_interception | is_touchdown | desc | yards_gained | td_team | |
|---|---|---|---|---|---|---|
| 49643 | True | False | False | (9:25) (Shotgun) 13-B.Purdy pass short right to 44-K.Juszczyk ran ob at KC 15 for 13 yards. | 13.0 | None |
| 49646 | False | False | False | (7:29) (Shotgun) 13-B.Purdy pass incomplete short right [95-C.Jones]. | 0.0 | None |
| 49650 | True | False | False | (6:50) (Shotgun) 15-P.Mahomes pass short left to 4-R.Rice to KC 34 for 6 yards (2-D.Lenoir, 54-F.Warner). | 6.0 | None |
| 49654 | True | False | False | (5:28) (Shotgun) 15-P.Mahomes pass short right to 11-M.Valdes-Scantling to KC 39 for -3 yards (7-C.Ward). | -3.0 | None |
| 49655 | True | False | False | (4:46) (Shotgun) 15-P.Mahomes pass short left to 11-M.Valdes-Scantling to KC 46 for 7 yards (33-L.Ryan; 2-D.Lenoir). | 7.0 | None |
| 49656 | True | False | False | (4:10) (Shotgun) 15-P.Mahomes pass short right to 4-R.Rice pushed ob at SF 41 for 13 yards (7-C.Ward). | 13.0 | None |
| 49657 | True | False | False | (3:33) 15-P.Mahomes pass short right to 10-I.Pacheco to SF 37 for 4 yards (48-O.Burks) [97-N.Bosa]. | 4.0 | None |
| 49659 | True | False | False | (2:48) (Shotgun) 15-P.Mahomes pass short right to 10-I.Pacheco to SF 32 for 5 yards (54-F.Warner). | 5.0 | None |
| 49662 | True | False | False | (:50) (Shotgun) 15-P.Mahomes pass short middle to 87-T.Kelce to SF 3 for 7 yards (27-J.Brown; 97-N.Bosa). | 7.0 | None |
| 49663 | True | False | True | (:06) (Shotgun) 15-P.Mahomes pass short right to 12-M.Hardman for 3 yards, TOUCHDOWN. | 3.0 | KC |
Analysis of the play-by-play data from our "nfl_data_py" data set. I am analyzing this for the play descriptions, and results (ex: 'is_completion', 'is_touchdown', etc.).
# Performing a groupby by the QB 'passer_name', and performing multiple aggregations.
# Creating a new DataFrame for the QB Perfromance Statistics we have gained for 1st, 3rd, and 4th downs.
qb_performance_stats = pass_plays_df.groupby('passer_name').agg(
attempts=('play_id', 'count'),
completions=('is_completion', 'sum'),
passing_yards=('yards_gained', 'sum'),
touchdowns=('is_touchdown', 'sum'),
interceptions=('is_interception', 'sum')
)
# Added this variable to avoid DividedbyZero errors.
epsilon = 1e-6
# Using our aggregated totals to calculate the performance metrics.
qb_performance_stats['completion_pct'] = (qb_performance_stats['completions'] / (qb_performance_stats['attempts'] + epsilon)) * 100
qb_performance_stats['yards_per_attempt'] = qb_performance_stats['passing_yards'] / (qb_performance_stats['attempts'] + epsilon)
qb_performance_stats['touchdown_rate'] = (qb_performance_stats['touchdowns'] / (qb_performance_stats['attempts'] + epsilon)) * 100
qb_performance_stats['interception_rate'] = (qb_performance_stats['interceptions'] / (qb_performance_stats['attempts'] + epsilon)) * 100
qb_performance_stats = qb_performance_stats.round(2)
pd.set_option('display.max_rows', 100)
# Statistical Context
total_plays = len(pbp_df)
pass_plays = len(pass_plays_df[pass_plays_df['play_type'] == 'pass'])
print(f"Dataset Scale: {total_plays:,} total plays, {pass_plays:,} pass attempts")
print(f"Statistical Power: Large sample ensures robust findings")
Dataset Scale: 49,665 total plays, 18,736 pass attempts Statistical Power: Large sample ensures robust findings
# Viewing the total season metrics to compare for analysis.
print("QB Performance Metrics (Playoff Stats Added):")
qb_performance_stats.sort_values(by='passing_yards', ascending=False).head()
QB Performance Metrics (Playoff Stats Added):
| attempts | completions | passing_yards | touchdowns | interceptions | completion_pct | yards_per_attempt | touchdown_rate | interception_rate | |
|---|---|---|---|---|---|---|---|---|---|
| passer_name | |||||||||
| J.Goff | 712 | 484 | 5412.0 | 37 | 12 | 67.98 | 7.60 | 5.20 | 1.69 |
| P.Mahomes | 744 | 504 | 5234.0 | 39 | 16 | 67.74 | 7.03 | 5.24 | 2.15 |
| B.Purdy | 548 | 374 | 5054.0 | 35 | 12 | 68.25 | 9.22 | 6.39 | 2.19 |
| D.Prescott | 650 | 451 | 4919.0 | 41 | 11 | 69.38 | 7.57 | 6.31 | 1.69 |
| T.Tagovailoa | 597 | 408 | 4823.0 | 34 | 15 | 68.34 | 8.08 | 5.70 | 2.51 |
Showcasing the QB Performance Metrics stats sorted by the passing yards total (REG + POST Seasons).
# Gathering the situational statistics of the QB's by down.
qb_situational_stats = pass_plays_df.groupby(['passer_name', 'down']).agg(
attempts=('play_id', 'count'),
completions=('is_completion', 'sum'),
passing_yards=('yards_gained', 'sum'),
touchdowns=('is_touchdown', 'sum'),
interceptions=('is_interception', 'sum')
)
qb_situational_stats['completion_pct'] = (qb_situational_stats['completions'] / (qb_situational_stats['attempts'] + epsilon)) * 100
qb_situational_stats['yards_per_attempt'] = qb_situational_stats['passing_yards'] / (qb_situational_stats['attempts'] + epsilon)
qb_situational_stats['touchdown_rate'] = (qb_situational_stats['touchdowns'] / (qb_situational_stats['attempts'] + epsilon)) * 100
qb_situational_stats['interception_rate'] = (qb_situational_stats['interceptions'] / (qb_situational_stats['attempts'] + epsilon)) * 100
qb_situational_stats = qb_situational_stats.round(2)
selected_qbs = ['P.Mahomes', 'J.Allen', 'L.Jackson', 'J.Herbert', 'J.Burrow']
selected_qbs1 = ['P.Mahomes', 'J.Allen', 'L.Jackson', 'J.Burrow', 'J.Herbert', 'T.Tagovailoa', 'B.Purdy', 'M.Stafford', 'J.Hurts', 'D.Carr',
'C.Stroud', 'G.Smith', 'J.Goff', 'D.Prescott', 'J.Love', 'T.Lawrence', 'G.Minshew']
# Displaying the DataFrame of some well-known QB's by down.
print("QB Situational Performance by Down:")
top_17_df = qb_situational_stats.loc[selected_qbs1]
top_17_df.head(28)
QB Situational Performance by Down:
| attempts | completions | passing_yards | touchdowns | interceptions | completion_pct | yards_per_attempt | touchdown_rate | interception_rate | ||
|---|---|---|---|---|---|---|---|---|---|---|
| passer_name | down | |||||||||
| P.Mahomes | 1.0 | 284 | 192 | 2063.0 | 14 | 9 | 67.61 | 7.26 | 4.93 | 3.17 |
| 2.0 | 262 | 190 | 1740.0 | 9 | 2 | 72.52 | 6.64 | 3.44 | 0.76 | |
| 3.0 | 186 | 118 | 1396.0 | 16 | 4 | 63.44 | 7.51 | 8.60 | 2.15 | |
| 4.0 | 12 | 4 | 35.0 | 0 | 1 | 33.33 | 2.92 | 0.00 | 8.33 | |
| J.Allen | 1.0 | 250 | 187 | 2112.0 | 9 | 6 | 74.80 | 8.45 | 3.60 | 2.40 |
| 2.0 | 235 | 154 | 1601.0 | 14 | 4 | 65.53 | 6.81 | 5.96 | 1.70 | |
| 3.0 | 150 | 86 | 934.0 | 9 | 7 | 57.33 | 6.23 | 6.00 | 4.67 | |
| 4.0 | 11 | 5 | 48.0 | 2 | 1 | 45.45 | 4.36 | 18.18 | 9.09 | |
| L.Jackson | 1.0 | 207 | 146 | 1921.0 | 8 | 1 | 70.53 | 9.28 | 3.86 | 0.48 |
| 2.0 | 181 | 127 | 1316.0 | 10 | 5 | 70.17 | 7.27 | 5.52 | 2.76 | |
| 3.0 | 122 | 68 | 824.0 | 10 | 2 | 55.74 | 6.75 | 8.20 | 1.64 | |
| 4.0 | 5 | 2 | 41.0 | 1 | 0 | 40.00 | 8.20 | 20.00 | 0.00 | |
| J.Burrow | 1.0 | 142 | 95 | 929.0 | 6 | 3 | 66.90 | 6.54 | 4.23 | 2.11 |
| 2.0 | 128 | 84 | 805.0 | 4 | 2 | 65.62 | 6.29 | 3.12 | 1.56 | |
| 3.0 | 89 | 61 | 550.0 | 5 | 1 | 68.54 | 6.18 | 5.62 | 1.12 | |
| 4.0 | 5 | 4 | 25.0 | 0 | 0 | 80.00 | 5.00 | 0.00 | 0.00 | |
| J.Herbert | 1.0 | 165 | 116 | 1079.0 | 4 | 1 | 70.30 | 6.54 | 2.42 | 0.61 |
| 2.0 | 156 | 108 | 1102.0 | 3 | 2 | 69.23 | 7.06 | 1.92 | 1.28 | |
| 3.0 | 121 | 65 | 865.0 | 8 | 4 | 53.72 | 7.15 | 6.61 | 3.31 | |
| 4.0 | 14 | 7 | 88.0 | 5 | 0 | 50.00 | 6.29 | 35.71 | 0.00 | |
| T.Tagovailoa | 1.0 | 214 | 151 | 1738.0 | 7 | 5 | 70.56 | 8.12 | 3.27 | 2.34 |
| 2.0 | 201 | 141 | 1490.0 | 13 | 2 | 70.15 | 7.41 | 6.47 | 1.00 | |
| 3.0 | 165 | 105 | 1466.0 | 14 | 8 | 63.64 | 8.88 | 8.48 | 4.85 | |
| 4.0 | 17 | 11 | 129.0 | 0 | 0 | 64.71 | 7.59 | 0.00 | 0.00 | |
| B.Purdy | 1.0 | 228 | 166 | 2379.0 | 10 | 7 | 72.81 | 10.43 | 4.39 | 3.07 |
| 2.0 | 183 | 125 | 1511.0 | 17 | 2 | 68.31 | 8.26 | 9.29 | 1.09 | |
| 3.0 | 133 | 79 | 1141.0 | 8 | 3 | 59.40 | 8.58 | 6.02 | 2.26 | |
| 4.0 | 4 | 4 | 23.0 | 0 | 0 | 100.00 | 5.75 | 0.00 | 0.00 |
This table showcases your Top 17 NFL QB's from the 2023 Season's passing statistics by down.
# Passing Yards Visualization by the Top 17 QB's in the 2023 NFL Season.
plt.figure(figsize=(10,8))
sns.set_style("darkgrid")
sns.barplot(data=top_17_df,
x='passing_yards', y='passer_name', hue='passer_name', palette='YlGnBu')
plt.title("Average Passing Yards Per Down (Season) (NFL 2023)")
plt.ylabel("Names")
plt.xlabel("Yards")
plt.show()
# Passing Yards Visualization by the Top 17 QB's in the 2023 NFL Season.
plt.figure(figsize=(10,8))
sns.boxplot(data=top_17_df,
x='completion_pct', y='passer_name', hue='passer_name', palette='YlGnBu')
plt.title("Completion Percentage Distribution (NFL 2023)")
plt.ylabel("Names")
plt.xlabel("Completion %")
plt.show()
Visual Analysis: Quarterback Performance Metrics¶
To begin our analysis, we visualized the performance of the top 17 quarterbacks from the 2023 season based on two fundamental metrics: their average passing yards per attempt and the distribution of their completion percentages.
1. Average Passing Yards per Attempt
The first chart is a bar plot that displays the average passing yards per attempt for each of the top 17 quarterbacks.
- Purpose: This visualization provides a straightforward ranking of which quarterbacks generate the most yardage on a typical pass play. The black error bars also give a sense of the variability around their average.
- Insight: A quick look at this chart immediately identifies the most explosive passers on a per-play basis, setting the stage for deeper questions about efficiency versus consistency.
2. Completion Percentage Distribution
For our second visualization, we chose a box plot to analyze each quarterback's completion percentage across all their pass attempts.
- Purpose: Unlike a simple bar chart showing the average, a box plot reveals the distribution and consistency of a quarterback's performance. The box represents the middle 50% of their completion percentages on a per-game basis, the line inside the box shows their median, and the whiskers show the range of their performance.
- Insight: This allows us to compare quarterbacks more deeply. A quarterback with a high median and a tight box is not just accurate, but consistently accurate. Conversely, a wide box might indicate a "boom-or-bust" passer. This view provides a much more nuanced understanding of quarterback reliability than a single average number ever could.
Further Analysis & Visualization¶
Analysis: How Do Quarterbacks Perform When the Pressure Mounts?¶
This initial analysis seeks to answer a fundamental question: Do quarterbacks get better or worse when facing the high-stakes pressure of a 3rd down compared to a standard 1st down? We establish a baseline performance for each quarterback on 1st down and then measure the change in their completion percentage and yards per attempt on 3rd down. A positive change indicates a player who elevates their game, while a negative change suggests they struggle under pressure.
# --- Data Loading and Preparation (from your notebook) ---
# Ensure the base DataFrame is loaded and prepared as you did before.
try:
pbp_df = nfl.import_pbp_data([2023])
except Exception as e:
print(f"Could not load data, using a local file as a fallback. Error: {e}")
# As a fallback for environments without internet, you might load a saved CSV
# pbp_df = pd.read_csv('pbp_2023.csv')
conditions = (pbp_df['play_type'] == 'pass') & (pbp_df['down'].isin([1.0, 2.0, 3.0, 4.0]))
pass_plays_df = pbp_df.loc[conditions].copy()
# Extract passer name using regex
pass_plays_df['passer_name'] = pass_plays_df['desc'].str.extract(r'([A-Z]\.\w+)\s+(?:pass|scrambles)', flags=re.IGNORECASE)
pass_plays_df.dropna(subset=['passer_name'], inplace=True)
# Feature Engineering for outcomes
pass_plays_df['is_completion'] = ~pass_plays_df['desc'].str.contains('incomplete', na=False, case=False)
pass_plays_df['is_interception'] = pass_plays_df['desc'].str.contains('INTERCEPTED', na=False, case=False)
pass_plays_df['is_touchdown'] = pass_plays_df['desc'].str.contains('TOUCHDOWN', na=False, case=False)
pass_plays_df['is_completion'] = np.where(pass_plays_df['is_interception'], False, pass_plays_df['is_completion'])
epsilon = 1e-6 # To avoid division by zero
# --- Performance Difference DataFrame (1st vs 3rd Down) ---
# Filter for only 1st and 3rd down pass plays
down_1_and_3_df = pass_plays_df[pass_plays_df['down'].isin([1.0, 3.0])].copy()
# Group by QB and down to get stats
situational_stats = down_1_and_3_df.groupby(['passer_name', 'down']).agg(
attempts=('play_id', 'count'),
completions=('is_completion', 'sum')
).reset_index()
# Calculate performance metrics
situational_stats['completion_pct'] = (situational_stats['completions'] / (situational_stats['attempts'] + epsilon)) * 100
# Pivot the table to get downs as columns
qb_pivot = situational_stats.pivot_table(
index='passer_name',
columns='down',
values=['completion_pct'],
fill_value=0
)
# Flatten the multi-index columns
qb_pivot.columns = [f'{stat}_{int(down)}' for stat, down in qb_pivot.columns]
# Calculate the delta (3rd down performance - 1st down performance)
qb_pivot['cmp_pct_delta'] = qb_pivot['completion_pct_3'] - qb_pivot['completion_pct_1']
# Filter for QBs with a reasonable number of attempts on both downs for meaningful comparison
attempt_counts = situational_stats.pivot_table(index='passer_name', columns='down', values='attempts', fill_value=0)
qualified_qbs = attempt_counts[(attempt_counts[1.0] >= 50) & (attempt_counts[3.0] >= 50)].index
performance_delta_df = qb_pivot.loc[qualified_qbs].sort_values(by='cmp_pct_delta', ascending=False)
print("--- Performance Delta: 1st vs 3rd Down ---")
performance_delta_df[['cmp_pct_delta']].round(2).head(19)
2023 done. Downcasting floats. --- Performance Delta: 1st vs 3rd Down ---
| cmp_pct_delta | |
|---|---|
| passer_name | |
| J.Hurts | 5.20 |
| T.DeVito | 4.49 |
| J.Burrow | 1.64 |
| K.Murray | 0.81 |
| J.Fields | 0.56 |
| G.Minshew | -1.31 |
| D.Watson | -1.53 |
| J.Goff | -2.19 |
| K.Cousins | -2.71 |
| R.Wilson | -2.81 |
| B.Mayfield | -3.34 |
| M.Stafford | -3.42 |
| D.Prescott | -3.50 |
| J.Dobbs | -3.95 |
| P.Mahomes | -4.16 |
| W.Levis | -4.91 |
| R.Tannehill | -5.06 |
| S.Howell | -5.10 |
| D.Ridder | -5.13 |
Visualization 1: Bar Chart of Performance Change¶
This chart visualizes the change in completion percentage from 1st to 3rd down. The diverging vlag color palette immediately draws the eye:
Blue bars represent quarterbacks who improve their completion percentage on 3rd down.
Red bars represent those whose performance declines.
This provides a quick, high-level overview of which players rise to the occasion.
# Assuming performance_delta_df is pre-loaded and prepared
# placeholder for the dataframe if it's not loaded:
# performance_delta_df = pd.DataFrame({'cmp_pct_delta': [10, 5, -5, -10]}, index=['QB1', 'QB2', 'QB3', 'QB4'])
plt.figure(figsize=(12, 10))
sns.set_style("whitegrid")
delta_plot = sns.barplot(
data=performance_delta_df,
x='cmp_pct_delta',
y=performance_delta_df.index,
palette='vlag', # A diverging palette is great for showing positive/negative change
orient='h'
)
plt.title('Change in Completion % (3rd Down vs. 1st Down)', fontsize=16, fontweight='bold')
plt.xlabel('Completion Percentage Point Difference', fontsize=12)
plt.ylabel('Quarterback', fontsize=12)
plt.axvline(0, color='black', linewidth=0.8) # Add a line at zero for reference
plt.tight_layout()
plt.show()
/var/folders/3j/nldb3n550ml79m4rq2xbs5pc0000gn/T/ipykernel_9395/3152937869.py:7: FutureWarning: Passing `palette` without assigning `hue` is deprecated and will be removed in v0.14.0. Assign the `y` variable to `hue` and set `legend=False` for the same effect.
Visualization 2: Dumbbell Plot for Direct Comparison¶
The dumbbell plot offers a more detailed and direct comparison. For each quarterback, it plots their 1st down completion percentage (blue dot) and their 3rd down percentage (green dot) on the same line.
The connecting line makes it easy to see the magnitude and direction of the change.
A green dot to the right of the blue dot signifies improvement.
The red dashed line shows the league average on 1st down, providing crucial context for evaluating a player's baseline performance.
# Assuming performance_delta_df is pre-loaded and prepared
# placeholder for the dataframe if it's not loaded:
# performance_delta_df = pd.DataFrame({
# 'completion_pct_1': [60, 65, 70, 75],
# 'completion_pct_3': [70, 68, 65, 60],
# 'cmp_pct_delta': [10, 3, -5, -15]
# }, index=['QB1', 'QB2', 'QB3', 'QB4'])
# Sort the dataframe for better visualization
df_plot = performance_delta_df.sort_values('cmp_pct_delta', ascending=True)
# Create the figure and axes
fig, ax = plt.subplots(figsize=(12, 10))
sns.set_style("whitegrid")
# Plot the lines connecting the points (the "bar" of the dumbbell)
ax.hlines(y=df_plot.index, xmin=df_plot['completion_pct_1'], xmax=df_plot['completion_pct_3'],
color='grey', alpha=0.4)
# Plot the points for 1st down and 3rd down
ax.scatter(df_plot['completion_pct_1'], df_plot.index, color='skyblue', alpha=1, s=100, label='1st Down Cmp %')
ax.scatter(df_plot['completion_pct_3'], df_plot.index, color='green', alpha=1, s=100, label='3rd Down Cmp %')
# Add labels and title
ax.legend()
ax.set_title('QB Completion Percentage: 1st Down vs. 3rd Down', fontsize=16, fontweight='bold')
ax.set_xlabel('Completion Percentage (%)', fontsize=12)
ax.set_ylabel('Quarterback', fontsize=12)
# Add a vertical line for the average 1st down completion % for context
avg_cmp_pct_1st = df_plot['completion_pct_1'].mean()
ax.axvline(x=avg_cmp_pct_1st, color='red', linestyle='--', linewidth=0.8, label=f'Avg 1st Down Cmp % ({avg_cmp_pct_1st:.1f}%)')
ax.legend()
plt.tight_layout()
plt.show()
Analysis: Pinpointing True "Clutch" Performance¶
Here, we narrow our focus to the most critical moments of a game: the final two minutes of a close contest (within one score). Performance in these high-leverage moments is what often defines a quarterback's legacy. We calculate standard performance metrics (completion percentage, yards, TDs, INTs) specifically within this "clutch" window to identify the league's most reliable late-game performers.
import pandas as pd
import numpy as np
# Assuming pass_plays_df is pre-loaded and prepared
# Placeholder for the dataframe if it's not loaded
# pass_plays_df = pd.DataFrame({
# 'game_seconds_remaining': [100, 200, 50, 150],
# 'score_differential': [3, 10, -7, -5],
# 'passer_name': ['QB1', 'QB2', 'QB1', 'QB3'],
# 'play_id': [1, 2, 3, 4],
# 'is_completion': [True, False, True, True],
# 'yards_gained': [15, 0, 25, 10],
# 'is_touchdown': [False, False, True, False],
# 'is_interception': [False, False, False, False]
# })
# epsilon = 1e-6
# Define "clutch" situations
is_clutch_time = pass_plays_df['game_seconds_remaining'] <= 120
is_close_game = pass_plays_df['score_differential'].between(-8, 7)
clutch_plays_df = pass_plays_df[is_clutch_time & is_close_game].copy()
# Aggregate performance in these clutch situations
clutch_performance_df = clutch_plays_df.groupby('passer_name').agg(
clutch_attempts=('play_id', 'count'),
clutch_completions=('is_completion', 'sum'),
clutch_yards=('yards_gained', 'sum'),
clutch_tds=('is_touchdown', 'sum'),
clutch_ints=('is_interception', 'sum')
)
# Calculate clutch performance rates
clutch_performance_df['clutch_cmp_pct'] = (clutch_performance_df['clutch_completions'] / (clutch_performance_df['clutch_attempts'] + epsilon)) * 100
clutch_performance_df['clutch_ypa'] = clutch_performance_df['clutch_yards'] / (clutch_performance_df['clutch_attempts'] + epsilon)
# Filter for QBs with at least 10 clutch attempts and sort
clutch_performance_df = clutch_performance_df[clutch_performance_df['clutch_attempts'] >= 10].sort_values(by='clutch_cmp_pct', ascending=False)
print("--- Clutch Performance: Last 2 Mins, Close Games ---")
clutch_performance_df.round(2)
--- Clutch Performance: Last 2 Mins, Close Games ---
| clutch_attempts | clutch_completions | clutch_yards | clutch_tds | clutch_ints | clutch_cmp_pct | clutch_ypa | |
|---|---|---|---|---|---|---|---|
| passer_name | |||||||
| C.Stroud | 14 | 12 | 182.0 | 2 | 0 | 85.71 | 13.00 |
| J.Goff | 20 | 17 | 163.0 | 1 | 0 | 85.00 | 8.15 |
| D.Ridder | 18 | 15 | 214.0 | 0 | 1 | 83.33 | 11.89 |
| G.Smith | 32 | 21 | 287.0 | 3 | 0 | 65.62 | 8.97 |
| K.Murray | 14 | 9 | 116.0 | 0 | 0 | 64.29 | 8.29 |
| L.Jackson | 11 | 7 | 95.0 | 1 | 0 | 63.64 | 8.64 |
| J.Browning | 11 | 7 | 72.0 | 1 | 0 | 63.64 | 6.55 |
| M.Jones | 26 | 16 | 155.0 | 1 | 1 | 61.54 | 5.96 |
| B.Zappe | 13 | 8 | 70.0 | 0 | 1 | 61.54 | 5.38 |
| B.Mayfield | 20 | 12 | 152.0 | 2 | 1 | 60.00 | 7.60 |
| J.Herbert | 20 | 12 | 160.0 | 0 | 1 | 60.00 | 8.00 |
| D.Prescott | 10 | 6 | 76.0 | 0 | 0 | 60.00 | 7.60 |
| T.Taylor | 21 | 12 | 76.0 | 0 | 1 | 57.14 | 3.62 |
| T.Tagovailoa | 16 | 9 | 51.0 | 1 | 1 | 56.25 | 3.19 |
| S.Howell | 28 | 15 | 167.0 | 3 | 1 | 53.57 | 5.96 |
| R.Wilson | 29 | 15 | 184.0 | 2 | 1 | 51.72 | 6.34 |
| J.Hurts | 22 | 11 | 99.0 | 1 | 3 | 50.00 | 4.50 |
| J.Allen | 12 | 6 | 65.0 | 1 | 0 | 50.00 | 5.42 |
| B.Purdy | 12 | 6 | 62.0 | 0 | 1 | 50.00 | 5.17 |
| K.Cousins | 12 | 6 | 57.0 | 0 | 1 | 50.00 | 4.75 |
| D.Lock | 10 | 5 | 92.0 | 1 | 0 | 50.00 | 9.20 |
| G.Minshew | 10 | 5 | 33.0 | 0 | 0 | 50.00 | 3.30 |
| B.Young | 10 | 5 | 61.0 | 0 | 0 | 50.00 | 6.10 |
| J.Dobbs | 21 | 10 | 105.0 | 1 | 0 | 47.62 | 5.00 |
| W.Levis | 19 | 9 | 92.0 | 0 | 1 | 47.37 | 4.84 |
| Z.Wilson | 17 | 8 | 137.0 | 0 | 1 | 47.06 | 8.06 |
| M.Stafford | 15 | 7 | 76.0 | 0 | 0 | 46.67 | 5.07 |
| P.Mahomes | 30 | 13 | 124.0 | 2 | 0 | 43.33 | 4.13 |
| J.Love | 27 | 11 | 120.0 | 1 | 4 | 40.74 | 4.44 |
| D.Carr | 18 | 7 | 84.0 | 0 | 1 | 38.89 | 4.67 |
| T.Siemian | 11 | 4 | 48.0 | 0 | 0 | 36.36 | 4.36 |
| J.Fields | 15 | 5 | 99.0 | 0 | 2 | 33.33 | 6.60 |
Analysis: A Holistic View of High-Pressure Play¶
This section uses the official NFL Passer Rating formula to provide a single, comprehensive metric for performance in the most challenging situations: on 3rd or 4th down, in the 4th quarter or overtime, of a close game. Passer Rating is ideal because it balances completion percentage, yards per attempt, touchdown rate, and interception rate into one number.
The resulting visualization ranks quarterbacks by this holistic score, with the number of attempts annotated to provide crucial context. A high rating on few attempts is less meaningful than a high rating on many attempts.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
# Assuming pass_plays_df is pre-loaded and prepared
# Placeholder for the dataframe if it's not loaded
# pass_plays_df = pd.DataFrame({
# 'qtr': [4, 3, 4, 4],
# 'score_differential': [3, 10, -7, -5],
# 'down': [3, 4, 4, 3],
# 'passer_name': ['QB1', 'QB2', 'QB1', 'QB3'],
# 'play_id': [1, 2, 3, 4],
# 'is_completion': [True, False, True, True],
# 'yards_gained': [15, 0, 25, 10],
# 'is_touchdown': [False, False, True, False],
# 'is_interception': [False, False, False, False]
# })
# epsilon = 1e-6
# is_close_game = pass_plays_df['score_differential'].between(-8, 7)
# Define high-pressure situations
is_late_game = (pass_plays_df['qtr'] >= 4)
is_late_and_close = is_late_game & is_close_game
is_high_leverage_down = pass_plays_df['down'].isin([3.0, 4.0])
high_pressure_df = pass_plays_df[is_late_and_close & is_high_leverage_down].copy()
# Aggregate performance
high_pressure_stats = high_pressure_df.groupby('passer_name').agg(
hp_attempts=('play_id', 'count'),
hp_completions=('is_completion', 'sum'),
hp_yards=('yards_gained', 'sum'),
hp_tds=('is_touchdown', 'sum'),
hp_ints=('is_interception', 'sum')
)
# Calculate Passer Rating components
a = ((high_pressure_stats['hp_completions'] / (high_pressure_stats['hp_attempts'] + epsilon)) - 0.3) * 5
b = ((high_pressure_stats['hp_yards'] / (high_pressure_stats['hp_attempts'] + epsilon)) - 3) * 0.25
c = (high_pressure_stats['hp_tds'] / (high_pressure_stats['hp_attempts'] + epsilon)) * 20
d = 2.375 - ((high_pressure_stats['hp_ints'] / (high_pressure_stats['hp_attempts'] + epsilon)) * 25)
# Clip each component
a = np.clip(a, 0, 2.375); b = np.clip(b, 0, 2.375); c = np.clip(c, 0, 2.375); d = np.clip(d, 0, 2.375)
high_pressure_stats['passer_rating'] = ((a + b + c + d) / 6) * 100
# Filter and sort
hp_viz_df = high_pressure_stats[high_pressure_stats['hp_attempts'] >= 15].sort_values(by='passer_rating', ascending=False)
# Create the visualization
plt.figure(figsize=(12, 10))
sns.set_style("whitegrid")
ax = sns.barplot(data=hp_viz_df, x='passer_rating', y=hp_viz_df.index, palette='coolwarm_r')
plt.title("QB Passer Rating on High-Pressure Downs\n(4th Qtr/OT, Close Game, 3rd/4th Down)", fontsize=16, fontweight='bold')
plt.xlabel("Passer Rating", fontsize=12)
plt.ylabel("Quarterback", fontsize=12)
# Add attempt count as annotation
for i, (p, count) in enumerate(zip(ax.patches, hp_viz_df['hp_attempts'])):
ax.text(p.get_width() + 1, i, f'({count} att)', va='center', fontsize=9)
plt.tight_layout()
plt.show()
/var/folders/3j/nldb3n550ml79m4rq2xbs5pc0000gn/T/ipykernel_9395/2366961600.py:55: FutureWarning: Passing `palette` without assigning `hue` is deprecated and will be removed in v0.14.0. Assign the `y` variable to `hue` and set `legend=False` for the same effect.
Analysis: Comparing the League's Best, Down by Down¶
Finally, we zoom in on a handful of quarterbacks widely considered to be "elite." This analysis moves away from comparing situational changes and instead provides a direct comparison of their raw performance on every down.
The grouped bar charts allow for an easy, side-by-side comparison. We can now see which of these top-tier players maintains consistency across all downs and which ones might have a specific down where they tend to struggle or excel.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
# Assuming pass_plays_df and epsilon are defined from previous cells
# First, we need to recalculate the situational stats for all downs
all_down_stats = pass_plays_df.groupby(['passer_name', 'down']).agg(
attempts=('play_id', 'count'),
completions=('is_completion', 'sum'),
passing_yards=('yards_gained', 'sum')
).reset_index()
# Calculate performance metrics
all_down_stats['completion_pct'] = (all_down_stats['completions'] / (all_down_stats['attempts'] + epsilon)) * 100
all_down_stats['yards_per_attempt'] = all_down_stats['passing_yards'] / (all_down_stats['attempts'] + epsilon)
# Prepare the data for plotting
selected_qbs = ['P.Mahomes', 'J.Allen', 'J.Burrow', 'J.Herbert', 'L.Jackson', 'G.Smith', 'J.Hurts', 'D.Prescott']
plot_data = all_down_stats[
all_down_stats['passer_name'].isin(selected_qbs) &
all_down_stats['down'].isin([1.0, 2.0, 3.0, 4.0])
]
# --- THE FIX: Define a NEW green, high-contrast palette ---
green_palette = ['#1b5e20', '#4caf50', '#81c784', '#a5d6a7']
# Create the Visualization with the new custom color scheme
fig, axes = plt.subplots(nrows=2, ncols=1, figsize=(15, 18))
sns.set_style("darkgrid")
# --- Plot 1: Completion Percentage by Down ---
sns.barplot(
data=plot_data,
x='passer_name',
y='completion_pct',
hue='down',
ax=axes[0],
palette=green_palette # Use our NEW green palette
)
axes[0].set_title('Completion % by Down for Elite QBs', fontsize=18, pad=20)
axes[0].set_ylabel('Completion %', fontsize=14, labelpad=15)
axes[0].set_xlabel('')
axes[0].tick_params(axis='x', rotation=0, labelsize=14)
axes[0].tick_params(axis='y', labelsize=12)
axes[0].legend(title='Down', fontsize=12, title_fontsize=14)
# --- Plot 2: Yards Per Attempt by Down ---
sns.barplot(
data=plot_data,
x='passer_name',
y='yards_per_attempt',
hue='down',
ax=axes[1],
palette=green_palette # Use our NEW green palette here as well
)
axes[1].set_title('Yards Per Attempt by Down for Elite QBs', fontsize=18, pad=20)
axes[1].set_ylabel('Yards Per Attempt', fontsize=14, labelpad=15)
axes[1].set_xlabel('Quarterback', fontsize=14, labelpad=15)
axes[1].tick_params(axis='x', rotation=0, labelsize=14)
axes[1].tick_params(axis='y', labelsize=12)
axes[1].legend(title='Down', fontsize=12, title_fontsize=14)
fig.subplots_adjust(hspace=0.4, top=0.94, bottom=0.08, left=0.1, right=0.95)
plt.show()
Visualization: The 4th Down Matrix - Trust vs. Success vs. Win Impact¶
This final, interactive matrix provides our most nuanced view of 4th down performance. By shifting the primary success metric from simple completions to actual conversions, we can more accurately identify which quarterbacks truly deliver when it matters most.
The chart is built on three key dimensions:
Trust (X-Axis): The raw number of 4th down attempts a quarterback has. A higher number means the coaching staff trusts them more in these situations.
Success (Y-Axis): The quarterback's 4th down conversion rate. This is the ultimate measure of their effectiveness on the money down.
Win Impact (Color): The color of each bubble represents a calculated "Win Conversion Impact" score. This score is a product of the quarterback's conversion rate and the number of their 4th down attempts that occurred in eventual wins. A brighter, hotter color signifies a player whose successful conversions have a direct and significant impact on their team winning the game.
It is vital, however, to view this as one powerful variable in the complex equation of what makes a quarterback "clutch." For instance, a player like Geno Smith, who led the league in combined 4th Quarter Comebacks and Game-Winning Drives, may not appear as a top outlier here due to a lower volume of 4th down attempts. This highlights that true clutch performance is multi-faceted and should be evaluated through a variety of analytical lenses.
# If you haven't installed the library yet, uncomment the line below
# !pip install plotly
import pandas as pd
import numpy as np
import plotly.express as px
import plotly.graph_objects as go
import plotly.io as pio
# Set the default renderer to be compatible with most notebook environments
pio.renderers.default = "iframe"
# --- SELF-CONTAINED DATA PREPARATION ---
# This section ensures all necessary data is created within this cell, preventing errors.
# Assuming pass_plays_df is pre-loaded and prepared
# Placeholder for the dataframe if it's not loaded
# pass_plays_df = pd.DataFrame({
# 'down': [4, 4, 4, 4], 'passer_name': ['QB1', 'QB1', 'QB2', 'QB2'],
# 'result': [3, -3, 10, 10], 'play_id': [1,2,3,4], 'is_completion': [True, False, True, True],
# 'first_down_pass': [1.0, 0.0, 1.0, 1.0]
# })
epsilon = 1e-6
# 1. Start with all 4th down plays
fourth_down_df = pass_plays_df[pass_plays_df['down'] == 4.0].copy()
fourth_down_df['is_conversion'] = np.where(fourth_down_df['first_down_pass'] == 1.0, 1, 0)
# 2. Aggregate base stats
fourth_down_stats = fourth_down_df.groupby('passer_name').agg(
fourth_down_attempts=('play_id', 'count'),
fourth_down_completions=('is_completion', 'sum'),
fourth_down_conversions=('is_conversion', 'sum')
)
# 3. Calculate both conversion and completion rates
fourth_down_stats['fourth_down_conversion_rate'] = (fourth_down_stats['fourth_down_conversions'] / (fourth_down_stats['fourth_down_attempts'] + epsilon)) * 100
fourth_down_stats['fourth_down_cmp_pct'] = (fourth_down_stats['fourth_down_completions'] / (fourth_down_stats['fourth_down_attempts'] + epsilon)) * 100
# 4. Filter for qualified QBs
qualified_fourth_down_qbs = fourth_down_stats[fourth_down_stats['fourth_down_attempts'] >= 5]
# --- Engineer the "Win Factor" and "Win Conversion Impact Score" ---
fourth_down_df['game_outcome'] = np.where(fourth_down_df['result'] > 0, 'Win', 'Loss')
win_loss_stats = fourth_down_df.groupby(['passer_name', 'game_outcome']).agg(
attempts=('play_id', 'count')
).unstack(fill_value=0)
if ('attempts', 'Win') not in win_loss_stats.columns: win_loss_stats[('attempts', 'Win')] = 0
if ('attempts', 'Loss') not in win_loss_stats.columns: win_loss_stats[('attempts', 'Loss')] = 0
win_loss_stats.columns = [f'{stat}_{outcome.lower()}' for stat, outcome in win_loss_stats.columns]
win_loss_stats.rename(columns={'attempts_win': 'win_attempts', 'attempts_loss': 'loss_attempts'}, inplace=True)
# Merge this new data into our main plotting DataFrame
plot_df = qualified_fourth_down_qbs.merge(
win_loss_stats[['win_attempts', 'loss_attempts']],
left_index=True,
right_index=True,
how='left'
).fillna(0)
# --- NEW LOGIC: Calculate the Win CONVERSION Impact Score ---
# This score now combines conversion rate with volume in wins
plot_df['win_conversion_impact'] = (plot_df['fourth_down_conversion_rate'] / 100) * plot_df['win_attempts']
plot_df.reset_index(inplace=True)
plot_df.rename(columns={'index': 'passer_name'}, inplace=True)
# --- Create the Updated Interactive Trust vs. CONVERSION Matrix ---
median_attempts = plot_df['fourth_down_attempts'].median()
# --- NEW LOGIC: Use median of conversion rate for the line ---
median_conv_rate = plot_df['fourth_down_conversion_rate'].median()
fig = px.scatter(
plot_df,
x='fourth_down_attempts',
y='fourth_down_conversion_rate', # Y-AXIS IS NOW CONVERSION RATE
color='win_conversion_impact', # Color is now based on conversion impact
size='fourth_down_attempts',
color_continuous_scale='Plasma',
hover_name='passer_name',
hover_data={
'passer_name': False, 'fourth_down_attempts': ':.0f',
'fourth_down_conversion_rate': ':.1f', # Show conversion rate on hover
'fourth_down_cmp_pct': ':.1f', # Keep completion % for context
'win_conversion_impact': ':.2f', 'win_attempts': True,
},
labels={
"fourth_down_attempts": "Total 4th Down Attempts (Trust)",
"fourth_down_conversion_rate": "4th Down Conversion % (Success)", # Updated Label
"win_conversion_impact": "Win Conversion Impact",
"win_attempts": "Attempts in Wins",
"fourth_down_cmp_pct": "Completion %"
}
)
# Add median lines and quadrant labels
fig.add_vline(x=median_attempts, line_width=2, line_dash="dash", line_color="blue")
fig.add_hline(y=median_conv_rate, line_width=2, line_dash="dash", line_color="blue") # Use new median
fig.add_annotation(x=plot_df['fourth_down_attempts'].max(), y=plot_df['fourth_down_conversion_rate'].max(), text="<b>Go-To Converters</b>", showarrow=False, xanchor='right', yanchor='top', font=dict(size=16, color="white"), bgcolor="black", opacity=0.7)
fig.add_annotation(x=plot_df['fourth_down_attempts'].min(), y=plot_df['fourth_down_conversion_rate'].max(), text="<b>Efficient Specialists</b>", showarrow=False, xanchor='left', yanchor='top', font=dict(size=16, color="white"), bgcolor="black", opacity=0.7)
fig.add_annotation(x=plot_df['fourth_down_attempts'].max(), y=plot_df['fourth_down_conversion_rate'].min(), text="<b>High-Volume Gamblers</b>", showarrow=False, xanchor='right', yanchor='bottom', font=dict(size=16, color="white"), bgcolor="black", opacity=0.7)
fig.add_annotation(x=plot_df['fourth_down_attempts'].min(), y=plot_df['fourth_down_conversion_rate'].min(), text="<b>Last Resorts</b>", showarrow=False, xanchor='left', yanchor='bottom', font=dict(size=16, color="white"), bgcolor="black", opacity=0.7)
# --- UPDATED LAYOUT ---
# Polish the plot
fig.update_layout(
title_text='<b>4th Down Matrix: Trust vs. Success vs. Win Impact</b>', title_x=0.5,
xaxis_title='Trust (Total 4th Down Attempts)', yaxis_title='Success (4th Down Conversion %)', # Updated Title
font=dict(family="Arial, sans-serif", size=12), width=950, height=750, showlegend=False,
coloraxis_colorbar=dict(
title=dict(
text="Win Conversion Impact<br>(Conv% x Win Attempts)",
font=dict(size=12), # Slightly smaller font for the title
side="right" # Ensure title is on the right side of the bar
),
tickfont=dict(size=12),
len=0.8,
y=0.5,
yanchor='middle'
),
margin=dict(r=200) # Increased the right margin to give the colorbar title more space
)
fig.show(config={'displayModeBar': False}, renderer='notebook')
For further analysis, we will need to combine our performance delta and clutch performance variables.
- See Below :
# Merge the two DataFrames using their indices as the join key
merged_performance_df = pd.merge(
performance_delta_df,
clutch_performance_df,
left_index=True, # Use the index from the left DataFrame
right_index=True, # Use the index from the right DataFrame
how='inner'
)
# After merging, the passer_name is still the index.
# Let's turn it into a regular column so it's easy to use for plotting.
merged_performance_df.reset_index(inplace=True)
# Verify the result
print("Merged DataFrame created successfully!")
print(f"Shape of the new DataFrame: {merged_performance_df.shape}")
print("First 5 rows:")
merged_performance_df.head()
Merged DataFrame created successfully! Shape of the new DataFrame: (30, 11) First 5 rows:
| passer_name | completion_pct_1 | completion_pct_3 | cmp_pct_delta | clutch_attempts | clutch_completions | clutch_yards | clutch_tds | clutch_ints | clutch_cmp_pct | clutch_ypa | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | J.Hurts | 61.214953 | 66.417910 | 5.202957 | 22 | 11 | 99.0 | 1 | 3 | 49.999998 | 4.500000 |
| 1 | K.Murray | 67.307692 | 68.115941 | 0.808249 | 14 | 9 | 116.0 | 0 | 0 | 64.285710 | 8.285714 |
| 2 | J.Fields | 58.695652 | 59.259259 | 0.563607 | 15 | 5 | 99.0 | 0 | 2 | 33.333331 | 6.600000 |
| 3 | G.Minshew | 60.818713 | 59.504132 | -1.314581 | 10 | 5 | 33.0 | 0 | 0 | 49.999995 | 3.300000 |
| 4 | J.Goff | 69.629629 | 67.441860 | -2.187769 | 20 | 17 | 163.0 | 1 | 0 | 84.999996 | 8.150000 |
Advanced Analysis: Identifying Clutch Performers¶
To move beyond traditional volume and efficiency stats, we created a scatter plot to analyze quarterback performance under two distinct types of pressure: situational pressure (3rd downs) and late-game pressure ("clutch" time). This visualization helps us identify not just who performs well, but how and when they elevate their game.
Chart Methodology:
This scatter plot positions quarterbacks based on two custom-engineered metrics:
- X-Axis (Situational Improvement): This axis represents the
Completion % Delta, calculated as the difference between a QB's completion percentage on 3rd down versus their 1st down baseline. A positive value (right side of the chart) indicates a quarterback who becomes more accurate on high-leverage 3rd downs. - Y-Axis (Clutch Performance): This axis shows the raw
Completion % in Clutch Time, defined as plays within the last two minutes of a close game (score differential of 8 points or less). A high value indicates a reliable performer when the game is on the line.
The chart is divided into four quadrants by the league-average lines for both metrics, allowing us to categorize QB performance profiles.
Interpreting the Quadrants:
- Top-Right (Dual-Threat Clutch): Quarterbacks in this quadrant are the elite clutch performers. They are not only highly accurate in late-game situations but also elevate their performance on critical 3rd downs.
- Top-Left (Late-Game Specialists): These quarterbacks excel when the game is on the line but, interestingly, show a decline in accuracy on 3rd downs compared to their 1st down baseline. They are clutch, but not necessarily consistent situational risers.
- Bottom-Right (Situational Risers): These players handle the pressure of 3rd downs well, improving their accuracy, but have struggled to maintain that performance in the final, decisive moments of close games.
- Bottom-Left (Struggles Under Pressure): Quarterbacks in this area perform below the league average in both late-game clutch situations and on 3rd downs.
By visualizing performance in this way, we can have a more nuanced discussion about what it means to be "clutch" and identify players whose value might be missed by more conventional statistics.
from adjustText import adjust_text
# --- Solarized Dark Theme Colors ---
background_color = '#002b36'
plot_area_color = '#073642'
text_color = '#93a1a1'
subtle_text_color = '#839496'
bright_text_color = '#fdf6e3' # New color for high visibility text
grid_color = '#586e75'
accent_color_points = '#b58900' # Solarized Yellow
accent_color_lines = '#dc322f' # Solarized Red
# --- End of Colors ---
# Create the figure and set the overall background color
fig = plt.figure(figsize=(16, 12))
fig.set_facecolor(background_color)
# Create the axes and set the plot area background and grid
ax = plt.axes()
ax.set_facecolor(plot_area_color)
ax.grid(color=grid_color, linestyle='--', linewidth=0.5)
# Set the color of the axis borders (spines) and tick marks
ax.spines['top'].set_color('none')
ax.spines['right'].set_color('none')
ax.spines['left'].set_color(subtle_text_color)
ax.spines['bottom'].set_color(subtle_text_color)
ax.tick_params(colors=subtle_text_color)
# Create the main scatter plot using our accent color
sns.scatterplot(
data=merged_performance_df,
x='cmp_pct_delta',
y='clutch_cmp_pct',
s=120,
alpha=0.8,
color=accent_color_points,
ax=ax,
legend=False
)
# --- INTELLIGENT ANNOTATIONS WITH adjustText ---
# Create a list of text annotations to be adjusted
texts = []
for i, row in merged_performance_df.iterrows():
texts.append(ax.text(
row['cmp_pct_delta'],
row['clutch_cmp_pct'],
row['passer_name'],
fontsize=12,
color=bright_text_color
))
# Automatically adjust text to avoid overlap, adding arrows for clarity
adjust_text(texts, arrowprops=dict(arrowstyle='-', color=grid_color, lw=0.5))
# Add average lines using a different accent color
avg_delta = merged_performance_df['cmp_pct_delta'].mean()
avg_clutch = merged_performance_df['clutch_cmp_pct'].mean()
plt.axvline(x=avg_delta, color=accent_color_lines, linestyle='--', linewidth=1.5, label=f'Avg. Delta ({avg_delta:.1%})')
plt.axhline(y=avg_clutch, color=accent_color_lines, linestyle='--', linewidth=1.5, label=f'Avg. Clutch Cmp % ({avg_clutch:.1%})')
# --- Final Polish with Themed Colors ---
plt.title('Clutch Performance vs. Situational Improvement', fontsize=20, pad=20, color=text_color)
plt.xlabel('Completion % Delta (3rd Down vs. 1st Down)', fontsize=14, color=subtle_text_color)
plt.ylabel('Completion % in Clutch Time (Last 2 Mins of Close Games)', fontsize=14, color=subtle_text_color)
# Style the legend
legend = plt.legend(fontsize=12)
legend.get_frame().set_facecolor(plot_area_color)
legend.get_frame().set_edgecolor(grid_color)
for text in legend.get_texts():
text.set_color(text_color)
plt.tight_layout()
plt.show()
Visualizing Clutch Performance: A Direct Comparison¶
Following our advanced scatter plot, we created a more direct visualization to isolate and rank quarterbacks based purely on their performance in "clutch" situations. This bar chart provides a clear, at-a-glance ranking of quarterback accuracy when the game is on the line.
Chart Methodology:
- Metric: The chart displays the
clutch_cmp_pctfor each quarterback. This metric is defined as the completion percentage on plays occurring in the last two minutes of a close game (score differential of 8 points or less). - Presentation: The quarterbacks are sorted in descending order, from highest to lowest completion percentage, for easy comparison. Each bar is annotated with its precise value, removing any ambiguity.
Key Insights:
This visualization strips away all other variables to answer a single, critical question: "Who is the most accurate passer when the pressure is highest?" It allows stakeholders to immediately identify the top performers in these high-leverage moments. While the previous scatter plot provided a nuanced, multi-dimensional view, this bar chart offers a definitive ranking based on a crucial, singular measure of performance.
# Ensure the clutch performance data is sorted for a clean bar chart
clutch_performance_df_sorted = clutch_performance_df.sort_values(by='clutch_cmp_pct', ascending=False)
# --- Create the Visualization ---
plt.figure(figsize=(12, 10))
sns.set_style("darkgrid")
ax = sns.barplot(
data=clutch_performance_df_sorted,
x='clutch_cmp_pct',
y=clutch_performance_df_sorted.index,
palette='YlGnBu', # Using "_r" reverses the palette for dark-to-light
hue=clutch_performance_df_sorted.index,
legend=False
)
# Add annotations for clarity
for p in ax.patches:
width = p.get_width()
ax.text(width + 1, # Position text slightly to the right of the bar
p.get_y() + p.get_height() / 2,
f'{width:.1f}%', # Format the text as a percentage
va='center')
plt.title('QB Completion %: Last 2 Mins, Close Games ', fontsize=18, fontweight='bold')
plt.xlabel('Completion Percentage (%)', fontsize=12)
plt.ylabel('Quarterback', fontsize=12)
plt.xlim(0, 100) # Set x-axis limit to 100% for context
plt.tight_layout()
plt.show()
Analysis: A Multi-Dimensional View of Clutch Performance¶
Objective: To create a comprehensive, multi-dimensional visualization of quarterback clutch performance. This chart aims to correlate a QB's accuracy in the clutch with their contribution to winning, while also factoring in the total number of winning games they influenced.
Methodology: This chart now encodes three distinct metrics for each quarterback:
Bar Length: Represents the QB's overall Clutch Completion %. A longer bar means higher accuracy in the last 2 minutes of close games.
Bar Color: Represents the Winning Clutch Completion Rate. A darker, richer blue indicates that a higher percentage of the QB's clutch completions occurred in games their team ultimately won.
Star Size: Represents the Total Win Amount. A larger star signifies a greater number of unique winning games in which that quarterback made clutch pass attempts.
How to Interpret This Chart:¶
This visualization allows us to identify the most effective clutch performers by looking for the ideal combination:
A long, dark blue bar with a large star represents the ultimate clutch quarterback: someone who is highly accurate, whose accuracy directly translates into winning outcomes, and who achieves this across a high volume of games.
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
try:
# Your code that might produce an error goes here
handles, labels = ax.get_legend_handles_labels()
labels[labels.index('clutch_win_count')] = "Number of Wins" # This line was causing an error
except:
# When an error occurs, Python will execute this block, which does nothing.
pass
# --- Step 1: Engineer the "Win Factor" and "Win Amount" Metrics ---
# Determine the game outcome for each clutch play
clutch_plays_df['game_outcome'] = clutch_plays_df['result'].apply(lambda x: 'Win' if x > 0 else 'Loss')
# Isolate only the completed passes in clutch situations
clutch_completions = clutch_plays_df[clutch_plays_df['is_completion'] == True].copy()
# Group by QB to count total clutch completions and those in wins
win_loss_completion_counts = clutch_completions.groupby('passer_name')['game_outcome'].value_counts().unstack(fill_value=0)
# Ensure both 'Win' and 'Loss' columns exist
if 'Win' not in win_loss_completion_counts:
win_loss_completion_counts['Win'] = 0
if 'Loss' not in win_loss_completion_counts:
win_loss_completion_counts['Loss'] = 0
# Calculate the Winning Clutch Completion Rate
win_loss_completion_counts['total_clutch_completions'] = win_loss_completion_counts['Win'] + win_loss_completion_counts['Loss']
win_loss_completion_counts['winning_clutch_rate'] = (win_loss_completion_counts['Win'] / win_loss_completion_counts['total_clutch_completions']) * 100
# NEW: Rename the 'Win' column to be our 'clutch_win_count'
win_loss_completion_counts.rename(columns={'Win': 'clutch_win_count'}, inplace=True)
# --- Step 2: Merge Metrics into our Main Clutch DataFrame ---
enhanced_clutch_df = clutch_performance_df.merge(
win_loss_completion_counts[['winning_clutch_rate', 'clutch_win_count']],
left_index=True,
right_index=True,
how='left'
).fillna(0)
# Sort the data for a clean bar chart
enhanced_clutch_df_sorted = enhanced_clutch_df.sort_values(by='clutch_cmp_pct', ascending=False)
# --- Step 3: Create the Enhanced Visualization ---
plt.figure(figsize=(16, 12))
sns.set_style("darkgrid")
ax = plt.gca()
# Create the base bar plot
sns.barplot(
data=enhanced_clutch_df_sorted,
x='clutch_cmp_pct',
y=enhanced_clutch_df_sorted.index,
hue='winning_clutch_rate',
palette='YlGnBu',
dodge=False,
ax=ax
)
# --- NEW: Overlay a scatter plot for the star markers ---
sns.scatterplot(
data=enhanced_clutch_df_sorted,
x='clutch_cmp_pct',
y=enhanced_clutch_df_sorted.index,
size='clutch_win_count', # Use win count for size
sizes=(50, 500), # Set a min and max size for the stars
marker='*', # Use a star marker
color='gold',
edgecolor='black',
ax=ax,
legend='brief' # Add a legend for the star sizes
)
# --- Add a Color Bar for the bar colors ---
norm = plt.Normalize(0, 100) # Normalize color bar from 0 to 100%
sm = plt.cm.ScalarMappable(cmap="YlGnBu", norm=norm)
sm.set_array([])
cbar = plt.colorbar(sm, ax=ax, fraction=0.046, pad=0.04)
cbar.set_label('% of Clutch Completions in Winning Games', rotation=270, labelpad=20)
# Add completion % annotations
# Add completion % annotations, skipping bars with a width of 0
for p in ax.patches:
width = p.get_width()
if width > 0: # Only add text if the bar has a positive width
ax.text(width + 1.5,
p.get_y() + p.get_height() / 2,
f'{width:.1f}%',
va='center')
# Polish the plot
plt.title('Multi-Factor QB Clutch Performance', fontsize=18, fontweight='bold')
plt.xlabel('Overall Completion % in Clutch Situations', fontsize=12)
plt.ylabel('Quarterback', fontsize=12)
plt.xlim(0, 105) # Adjust x-axis to make room for labels
# This will print the exact list of labels that Seaborn created.
# print("Automatically generated legend labels:", labels)
# --- The Correction ---
try:
# Look at the output above. The first item in the list is the title.
# Replace that first item with your desired title.
original_size_title = labels[0] # This gets the actual title, whatever it is
new_size_title = "Number of Wins"
# Find the index of the old title and replace it with the new one.
labels[labels.index(original_size_title)] = new_size_title
except (ValueError, IndexError):
print(f"Error: Could not automatically find the legend title to replace.")
print("Please check the 'Original legend labels' output above to find the correct string.")
# Now, rebuild the legend with the updated labels
ax.legend(handles, labels, title='Legend', loc='lower right', facecolor='lightgray')
plt.tight_layout()
plt.show() #command should come after this block
Error: Could not automatically find the legend title to replace. Please check the 'Original legend labels' output above to find the correct string.
Analysis: QB Completion % in Clutch Situations¶
This visualization ranks qualified quarterbacks by their completion percentage during "clutch" time. We've defined this as pass plays that occur within the last two minutes of a close game, where the score differential is between -8 and +7 points.
The resulting horizontal bar chart provides a clear and immediate ranking of passing accuracy when the game's outcome is on the line. By sorting the quarterbacks from highest to lowest completion percentage, we can easily identify the most reliable passers in these high-pressure, late-game scenarios. This analysis moves beyond general season-long stats to focus on a specific, high-leverage aspect of quarterback play.
# After your clutch performance visualization
print(f"⚠️ Sample Size Note: Based on {len(clutch_performance_df)} QBs with 10+ clutch attempts")
print("Statistical Interpretation: Results are indicative but may vary with larger samples.")
⚠️ Sample Size Note: Based on 32 QBs with 10+ clutch attempts Statistical Interpretation: Results are indicative but may vary with larger samples.
# Filter the original DataFrame for 4th down pass plays
fourth_down_df = pass_plays_df[pass_plays_df['down'] == 4.0].copy()
# Aggregate the performance stats for each quarterback on 4th down
fourth_down_stats_df = fourth_down_df.groupby('passer_name').agg(
fourth_down_attempts=('play_id', 'count'),
fourth_down_completions=('is_completion', 'sum'),
fourth_down_yards=('yards_gained', 'sum'),
fourth_down_tds=('is_touchdown', 'sum'),
fourth_down_ints=('is_interception', 'sum')
)
# Calculate performance metrics, adding a small epsilon to avoid division by zero
epsilon = 1e-6
fourth_down_stats_df['fourth_down_cmp_pct'] = (fourth_down_stats_df['fourth_down_completions'] / (fourth_down_stats_df['fourth_down_attempts'] + epsilon)) * 100
fourth_down_stats_df['fourth_down_ypa'] = fourth_down_stats_df['fourth_down_yards'] / (fourth_down_stats_df['fourth_down_attempts'] + epsilon)
# For a meaningful analysis, let's filter for QBs with at least 5 attempts on 4th down
qualified_fourth_down_qbs = fourth_down_stats_df[fourth_down_stats_df['fourth_down_attempts'] >= 5]
# Sort by completion percentage and display the new DataFrame
qualified_fourth_down_qbs_sorted = qualified_fourth_down_qbs.sort_values(by='fourth_down_cmp_pct', ascending=False)
print("--- QB Performance on 4th Down (min. 5 attempts) ---")
print(qualified_fourth_down_qbs_sorted.round(2).head(10))
--- QB Performance on 4th Down (min. 5 attempts) ---
fourth_down_attempts fourth_down_completions \
passer_name
Z.Wilson 6 6
T.Boyle 6 5
J.Burrow 5 4
R.Tannehill 5 4
M.Stafford 8 6
K.Cousins 10 7
S.Howell 19 13
T.Tagovailoa 17 11
J.Hurts 13 8
J.Love 18 11
fourth_down_yards fourth_down_tds fourth_down_ints \
passer_name
Z.Wilson 64.0 0 0
T.Boyle 39.0 0 1
J.Burrow 25.0 0 0
R.Tannehill 50.0 0 0
M.Stafford 57.0 1 0
K.Cousins 86.0 1 0
S.Howell 129.0 2 0
T.Tagovailoa 129.0 0 0
J.Hurts 88.0 1 0
J.Love 111.0 4 0
fourth_down_cmp_pct fourth_down_ypa
passer_name
Z.Wilson 100.00 10.67
T.Boyle 83.33 6.50
J.Burrow 80.00 5.00
R.Tannehill 80.00 10.00
M.Stafford 75.00 7.12
K.Cousins 70.00 8.60
S.Howell 68.42 6.79
T.Tagovailoa 64.71 7.59
J.Hurts 61.54 6.77
J.Love 61.11 6.17
Analysis: QB Performance on 4th Down¶
Now, we will visualize quarterback performance on what is arguably the most critical down in football: 4th down. On these plays, the offense must gain the required yardage to continue their drive, making each pass a high-stakes event.
To create a meaningful comparison, we first filtered our dataset to include only 4th down pass plays. From this subset, we calculated the completion percentage for each quarterback who had at least five 4th down attempts during the season. This qualification threshold ensures our analysis is based on a reasonable sample size.
The following bar chart ranks these qualified quarterbacks by their 4th down completion percentage. This visualization highlights which passers are most effective at converting in these "do-or-die" situations, offering a powerful metric for evaluating performance under extreme situational pressure.
Analysis: 4th Down Accuracy and Its Correlation to Wins¶
This visualization moves beyond simple 4th down completion percentage to ask a more critical question: Does a quarterback's accuracy on these crucial downs actually contribute to winning?
To answer this, the chart now visualizes three distinct metrics for each qualified quarterback (min. 5 total 4th down attempts):
Bar Length: Represents the QB's overall 4th Down Completion %. A longer bar means higher accuracy on 4th down.
Bar Color: Represents the Winning 4th Down Completion Rate. A darker, richer color indicates that a higher percentage of the QB's 4th down completions occurred in games their team ultimately won.
Star Size: Represents the Total Win Amount. A larger star signifies a greater number of unique winning games in which that quarterback made 4th down pass attempts.
The ideal performer in this chart is a player with a long, dark-colored bar and a large star, indicating a QB who is accurate on 4th down, does it in winning efforts, and does so frequently.
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
# --- Step 1: Prepare the Data for Win/Loss Analysis on 4th Down ---
# Assuming 'pass_plays_df' is your base DataFrame
# Filter for only 4th down pass plays
fourth_down_df = pass_plays_df[pass_plays_df['down'] == 4.0].copy()
# Determine the game outcome for the passing team
fourth_down_df['game_outcome'] = fourth_down_df['result'].apply(lambda x: 'Win' if x > 0 else 'Loss')
# Aggregate overall 4th down stats for our base metrics
base_fourth_down_stats = fourth_down_df.groupby('passer_name').agg(
total_attempts=('play_id', 'count'),
total_completions=('is_completion', 'sum')
)
base_fourth_down_stats['overall_cmp_pct'] = (base_fourth_down_stats['total_completions'] / base_fourth_down_stats['total_attempts']) * 100
# Aggregate win/loss specific stats for the "win factor"
win_loss_completions = fourth_down_df[fourth_down_df['is_completion'] == True].groupby('passer_name')['game_outcome'].value_counts().unstack(fill_value=0)
if 'Win' not in win_loss_completions: win_loss_completions['Win'] = 0
win_loss_completions['winning_completion_rate'] = (win_loss_completions['Win'] / (win_loss_completions['Win'] + win_loss_completions['Loss'])) * 100
win_loss_completions.rename(columns={'Win': 'win_count'}, inplace=True)
# --- Step 2: Merge and Filter ---
# Combine the metrics into one DataFrame
enhanced_fourth_down_df = base_fourth_down_stats.merge(
win_loss_completions[['winning_completion_rate', 'win_count']],
left_index=True,
right_index=True,
how='left'
).fillna(0)
# Filter for QBs with at least 5 total attempts and sort
qualified_df = enhanced_fourth_down_df[enhanced_fourth_down_df['total_attempts'] >= 5].sort_values(by='overall_cmp_pct', ascending=False)
# --- Step 3: Visualize ---
# --- Solarized Light Theme Colors ---
background_color = '#fdf6e3'
plot_area_color = '#fdf6e3'
text_color = '#657b83'
grid_color = '#eee8d5'
plt.figure(figsize=(16, 12))
sns.set_style("darkgrid")
ax = plt.gca()
fig = plt.gcf()
fig.set_facecolor(background_color)
ax.set_facecolor(plot_area_color)
ax.grid(axis='x', color=grid_color, linestyle='-')
ax.spines['top'].set_visible(False); ax.spines['right'].set_visible(False)
ax.spines['left'].set_color(text_color); ax.spines['bottom'].set_color(text_color)
ax.tick_params(colors=text_color)
# Bar plot for completion % (length) and winning rate (color)
sns.barplot(
data=qualified_df,
x='overall_cmp_pct',
y=qualified_df.index,
hue='winning_completion_rate',
palette='YlGnBu',
dodge=False,
ax=ax,
legend=False
)
# Scatter plot for win count (star size)
sns.scatterplot(
data=qualified_df,
x='overall_cmp_pct',
y=qualified_df.index,
size='win_count',
sizes=(50, 500),
marker='*',
color='gold',
edgecolor='black',
ax=ax,
legend='brief'
)
# Color bar for the winning completion rate
norm = plt.Normalize(0, 100)
sm = plt.cm.ScalarMappable(cmap="YlGnBu", norm=norm)
sm.set_array([])
cbar = plt.colorbar(sm, ax=ax, fraction=0.046, pad=0.04)
cbar.set_label('% of 4th Down Completions in Winning Games', rotation=270, labelpad=20, color=text_color)
cbar.ax.tick_params(colors=text_color)
plt.title('4th Down Completion % and Its Impact on Wins', fontsize=18, fontweight='bold', color=text_color)
plt.xlabel('Overall 4th Down Completion %', fontsize=12, color=text_color)
plt.ylabel('Quarterback', fontsize=12, color=text_color)
plt.xlim(0, 110)
# This will print the exact list of labels that Seaborn created.
print("Automatically generated legend labels:", labels)
# --- The Correction ---
try:
# Look at the output above. The first item in the list is the title.
# Replace that first item with your desired title.
original_size_title = labels[0] # This gets the actual title, whatever it is
new_size_title = "Number of Wins"
# Find the index of the old title and replace it with the new one.
labels[labels.index(original_size_title)] = new_size_title
except (ValueError, IndexError):
print(f"Error: Could not automatically find the legend title to replace.")
print("Please check the 'Original legend labels' output above to find the correct string.")
# Now, rebuild the legend with the updated labels
ax.legend(handles, labels, title='Legend', loc='lower right', facecolor='lightgray')
plt.tight_layout()
plt.show()
# The plt.show() command should come after this block
Automatically generated legend labels: [] Error: Could not automatically find the legend title to replace. Please check the 'Original legend labels' output above to find the correct string.
Analysis: Focusing on What Matters - 4th Down Conversions¶
A simple completion on 4th down is not enough; the true measure of success is whether the play results in a first down. This analysis shifts the focus from mere completions to successful conversions, providing a much clearer picture of which quarterbacks deliver when the game is on the line.
The visualization now ranks quarterbacks by their 4th down conversion rate, the most critical success metric for these plays. The annotations provide crucial supporting context:
(Conversions / Attempts): The raw count of successful conversions out of total attempts.
Cmp %: The quarterback's overall completion percentage on 4th down, offered as a secondary metric.
This approach allows us to identify players who are not just accurate, but effective at moving the chains under maximum pressure.
Application for Team Management For a General Manager or a front office, this type of granular analysis is invaluable for player acquisition and team-building for several reasons:
Identifying True Clutch Performers: This chart separates quarterbacks who make the crucial play from those who might accumulate "empty" stats. A high conversion rate, especially with a significant number of attempts, is a strong indicator of a player who can be trusted in game-deciding moments.
Finding Undervalued Assets: A player with a modest overall completion percentage but a very high 4th down conversion rate could be an undervalued asset. This analysis can uncover players who have a specific, valuable skill set that traditional stats might overlook.
Informing In-Game Decision Making: For a coaching staff, knowing which quarterbacks have a proven history of converting on 4th down can directly influence play-calling and the decision to "go for it" versus punting or kicking a field goal. It provides a data-driven foundation for taking calculated risks.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
# Assuming pass_plays_df is pre-loaded and prepared
# Placeholder for the dataframe if it's not loaded
# pass_plays_df = pd.DataFrame({
# 'down': [4, 4, 4, 4, 4, 4],
# 'passer_name': ['QB1', 'QB1', 'QB2', 'QB2', 'QB3', 'QB3'],
# 'play_id': range(6),
# 'is_completion': [True, False, True, True, True, True],
# 'first_down_pass': [1.0, 0.0, 1.0, 0.0, 1.0, 1.0],
# 'yards_gained': [10, 0, 12, 2, 15, 5]
# })
# epsilon = 1e-6
# --- 4th Down Performance Analysis ---
fourth_down_df = pass_plays_df[pass_plays_df['down'] == 4.0].copy()
# --- Identify successful conversions ---
fourth_down_df['is_conversion'] = np.where(fourth_down_df['first_down_pass'] == 1.0, 1, 0)
# Aggregate performance on 4th down
fourth_down_stats = fourth_down_df.groupby('passer_name').agg(
fourth_down_attempts=('play_id', 'count'),
fourth_down_completions=('is_completion', 'sum'),
fourth_down_conversions=('is_conversion', 'sum')
)
# Calculate 4th down CONVERSION rate
fourth_down_stats['fourth_down_conversion_rate'] = (fourth_down_stats['fourth_down_conversions'] / (fourth_down_stats['fourth_down_attempts'] + epsilon)) * 100
# Filter for QBs with a meaningful number of attempts
qualified_fourth_down_qbs = fourth_down_stats[fourth_down_stats['fourth_down_attempts'] >= 5].sort_values(by='fourth_down_conversion_rate', ascending=False)
# --- NEW: Calculate Averages for Qualified QBs ---
avg_conv_rate = qualified_fourth_down_qbs['fourth_down_conversion_rate'].mean()
avg_conv_amount = qualified_fourth_down_qbs['fourth_down_conversions'].mean()
avg_attempts = qualified_fourth_down_qbs['fourth_down_attempts'].mean()
# --- Visualize 4th Down CONVERSION Rate ---
plt.figure(figsize=(14, 10)) # Increased size for readability
sns.set_style("darkgrid")
ax = sns.barplot(
data=qualified_fourth_down_qbs,
x='fourth_down_conversion_rate',
y=qualified_fourth_down_qbs.index,
palette='viridis'
)
plt.title('4th-Down Conversion Rates (5 Attempts Min.)', fontsize=18, fontweight='bold')
plt.xlabel('Conversion Percentage (%)', fontsize=14)
plt.ylabel('Quarterback', fontsize=14)
# --- NEW: Add Average Line ---
ax.axvline(x=avg_conv_rate, color='red', linestyle='--', linewidth=1.5, label=f'Avg. Conv. Rate ({avg_conv_rate:.1f}%)')
# --- NEW: Add Text Box with Other Averages ---
avg_text = (f'League Averages (Qualified QBs):\n'
f'Conversion Amount: {avg_conv_amount:.1f}\n'
f'4th Down Attempts: {avg_attempts:.1f}')
# Position the text box in the bottom right corner
plt.text(0.95, 0.15, avg_text, transform=ax.transAxes, fontsize=12,
verticalalignment='top', horizontalalignment='right',
bbox=dict(boxstyle='round,pad=0.5', fc='wheat', alpha=0.5))
# Add annotations for individual player stats
for i, (p, row) in enumerate(zip(ax.patches, qualified_fourth_down_qbs.itertuples())):
attempts = int(row.fourth_down_attempts)
conversions = int(row.fourth_down_conversions)
cmp_pct = (row.fourth_down_completions / attempts) * 100 if attempts > 0 else 0
annotation_text = f'({conversions}/{attempts} conv, {cmp_pct:.0f}% cmp)'
ax.text(p.get_width() + 0.5, p.get_y() + p.get_height() / 2, annotation_text, va='center')
# Ensure the legend for the average line is displayed
ax.legend()
plt.tight_layout()
plt.show()
/var/folders/3j/nldb3n550ml79m4rq2xbs5pc0000gn/T/ipykernel_9395/1726249704.py:46: FutureWarning: Passing `palette` without assigning `hue` is deprecated and will be removed in v0.14.0. Assign the `y` variable to `hue` and set `legend=False` for the same effect.
# After 4th down visualization
print(f"📊 Statistical Context: {len(qualified_fourth_down_qbs)} QBs analyzed (min. 5 attempts)")
print("Confidence Level: Moderate - 4th down plays are naturally limited")
📊 Statistical Context: 39 QBs analyzed (min. 5 attempts) Confidence Level: Moderate - 4th down plays are naturally limited
3. Time-Series Performance Analysis: QB Trajectories¶
Analysis:
While single-season stats provide a useful snapshot, analyzing performance trends over multiple seasons reveals crucial insights into a quarterback's consistency, development, and career arc. To explore this, we've plotted the season-by-season passer rating for the top 15 quarterbacks by total passing yards from 2019-2023.
To avoid a cluttered single chart, a "small multiples" visualization is used, providing a clear, individual chart for each quarterback. The horizontal red line on each chart indicates the average passer rating for all qualified QBs during this five-year period, offering immediate context on whether a player performed above or below the league average.
Key Observations:
- Sustained Elite Tier: Players like Patrick Mahomes and Aaron Rodgers consistently perform well above the league average, demonstrating a clear top tier of efficiency.
- Veteran Consistency: Tom Brady showcases remarkable consistency, defying age with elite performance throughout the period. Kirk Cousins also demonstrates consistent above-average play.
- Emerging Stars: We can clearly see the upward trajectories of younger quarterbacks like Justin Herbert and Joe Burrow as they established themselves as elite passers.
- Career Resurgence: The chart for Geno Smith tells a powerful story of a player with limited action who had a dramatic, league-leading performance spike in 2022.
This time-series view provides a vital narrative layer to the data, setting the stage for our next step: using machine learning to identify distinct statistical archetypes within this group of top passers.
import pandas as pd
import nfl_data_py as nfl
# Define the years we want
years = range(2019, 2024)
print("--- STEP 1: Loading Seasonal Stats ---")
# Load the seasonal stats data
seasonal_stats_df = nfl.import_seasonal_data(years, 'REG')
# Confirm it loaded
print("Success! 'seasonal_stats_df' is now defined.")
--- STEP 1: Loading Seasonal Stats --- Success! 'seasonal_stats_df' is now defined.
print("\n--- STEP 2: Loading Player Data ---")
# Load the player data
player_df = nfl.import_players()
# Confirm it loaded
print("Success! 'player_df' is now defined.")
--- STEP 2: Loading Player Data --- Success! 'player_df' is now defined.
print("\n--- STEP 3: Merging the DataFrames ---")
# Reset the index of the player_df to make 'gsis_id' a column
player_df_reset = player_df.reset_index()
# Perform the merge using the correct keys
merged_df = pd.merge(
seasonal_stats_df,
player_df_reset[['gsis_id', 'display_name']],
left_on='player_id',
right_on='gsis_id',
how='left'
)
# Confirm the merge was successful
print("Success! 'merged_df' is now defined.")
--- STEP 3: Merging the DataFrames --- Success! 'merged_df' is now defined.
import nfl_data_py as nfl
print("\n--- STEP 2: Loading Player Data ---")
# Load the player data using the function from the documentation
player_df = nfl.import_players()
# Confirm it loaded by printing the first few rows and the columns
print("Success! Player data loaded. Here's a sample and the available columns:")
print(player_df.head())
print("\nPlayer DataFrame Columns:")
print(player_df.columns)
--- STEP 2: Loading Player Data ---
Success! Player data loaded. Here's a sample and the available columns:
gsis_id display_name common_first_name first_name last_name \
0 00-0028830 Isaako Aaitui Isaako Isaako Aaitui
1 00-0038389 Israel Abanikanda Israel Israel Abanikanda
2 00-0024644 Jon Abbate Jon Jon Abbate
3 ABB498348 Vince Abbott Vince Vincent Abbott
4 00-0031021 Jared Abbrederis Jared Jared Abbrederis
short_name football_name suffix esb_id nfl_id pfr_id pff_id \
0 None None None AAI622937 None AaitIs00 6998
1 I.Abanikanda Israel None ABA159567 56008 AbanIs00 122999
2 None None None ABB051371 None None None
3 None None None ABB498348 None abbotvin01 None
4 J.Abbrederis Jared None ABB650964 41405 AbbrJa00 8811
otc_id espn_id smart_id birth_date \
0 2535 14856 32004141-4962-2937-61ff-017b1804dec6 1987-01-25
1 10967 4429202 32004142-4115-9567-2e24-0eab29f6a4b9 2002-10-05
2 None 10801 32004142-4205-1371-db95-1abc96313b69 1985-06-18
3 None None 32004142-4249-8348-e00f-5fbbe6a0c73c 1958-05-31
4 3115 16836 32004142-4265-0964-fc36-bb0ad76ff6e6 1990-12-17
position_group position ngs_position_group ngs_position height weight \
0 DL NT None None 76.0 307.0
1 RB RB None None 70.0 216.0
2 LB LB None None 71.0 245.0
3 SPEC K None None 71.0 207.0
4 WR WR WR WR 73.0 195.0
headshot \
0 https://static.www.nfl.com/image/private/{formatInstructions}/league/hwncbbaztu3pc5unqgnj
1 https://static.www.nfl.com/image/private/{formatInstructions}/league/ythhca1bq2bjbhgfyf9o
2 https://static.www.nfl.com/image/private/{formatInstructions}/league/gi1ncvxcsz8vyi5a4fxp
3 https://static.www.nfl.com/image/private/{formatInstructions}/league/g1xvyvzrfbrtbeqjvqgf
4 https://static.www.nfl.com/image/private/{formatInstructions}/league/p5gqmcyci9youm2r6oeb
college_name college_conference \
0 UNLV None
1 Pittsburgh Atlantic Coast Conference
2 Wake Forest None
3 California State-Fullerton; Washington None
4 Wisconsin None
jersey_number rookie_season last_season latest_team status ngs_status \
0 0 2011 2014 WAS DEV None
1 20 2023 2025 SF ACT ACT
2 67 2007 2007 HOU RES None
3 0 1987 1988 LAC ACT None
4 10 2014 2017 DET CUT CUT
ngs_status_short_description years_of_experience pff_position pff_status \
0 None 2 DI None
1 Active 3 HB A
2 None 0 None None
3 None 2 None None
4 None 4 WR None
draft_year draft_round draft_pick draft_team
0 NaN NaN NaN None
1 2023.0 5.0 143.0 NYJ
2 NaN NaN NaN None
3 NaN NaN NaN None
4 2014.0 5.0 176.0 GB
Player DataFrame Columns:
Index(['gsis_id', 'display_name', 'common_first_name', 'first_name',
'last_name', 'short_name', 'football_name', 'suffix', 'esb_id',
'nfl_id', 'pfr_id', 'pff_id', 'otc_id', 'espn_id', 'smart_id',
'birth_date', 'position_group', 'position', 'ngs_position_group',
'ngs_position', 'height', 'weight', 'headshot', 'college_name',
'college_conference', 'jersey_number', 'rookie_season', 'last_season',
'latest_team', 'status', 'ngs_status', 'ngs_status_short_description',
'years_of_experience', 'pff_position', 'pff_status', 'draft_year',
'draft_round', 'draft_pick', 'draft_team'],
dtype='object')
import pandas as pd
# Assuming player_df is loaded from Step 2
print("\n--- Let's Find the Right Column Name ---")
# Reset the index to ensure all data is in columns
player_df_reset = player_df.reset_index()
# Print all the column names from this DataFrame
# The player ID is in this list. We need to find it.
print("Columns available in the Player DataFrame are:")
print(player_df_reset.columns)
--- Let's Find the Right Column Name ---
Columns available in the Player DataFrame are:
Index(['index', 'gsis_id', 'display_name', 'common_first_name', 'first_name',
'last_name', 'short_name', 'football_name', 'suffix', 'esb_id',
'nfl_id', 'pfr_id', 'pff_id', 'otc_id', 'espn_id', 'smart_id',
'birth_date', 'position_group', 'position', 'ngs_position_group',
'ngs_position', 'height', 'weight', 'headshot', 'college_name',
'college_conference', 'jersey_number', 'rookie_season', 'last_season',
'latest_team', 'status', 'ngs_status', 'ngs_status_short_description',
'years_of_experience', 'pff_position', 'pff_status', 'draft_year',
'draft_round', 'draft_pick', 'draft_team'],
dtype='object')
import pandas as pd
# Assuming seasonal_stats_df and player_df are already loaded.
print("\n--- STEP 4: Merging with the Correct Column Names ---")
# Reset the index of the player_df to make 'gsis_id' a column
player_df_reset = player_df.reset_index()
# We know the keys are 'player_id' in the stats table and 'gsis_id' in the player table.
# Let's perform the merge.
merged_df = pd.merge(
seasonal_stats_df,
player_df_reset[['gsis_id', 'display_name']], # Select only the columns we need
left_on='player_id',
right_on='gsis_id',
how='left'
)
# Confirm the merge was successful.
print("Success! Merge complete. The 'display_name' column is now in our dataset.")
print(merged_df.head())
--- STEP 4: Merging with the Correct Column Names ---
Success! Merge complete. The 'display_name' column is now in our dataset.
player_id season season_type completions attempts passing_yards \
0 00-0019596 2019 REG 373 613 4057.0
1 00-0019596 2020 REG 401 610 4633.0
2 00-0019596 2021 REG 485 719 5316.0
3 00-0019596 2022 REG 490 733 4694.0
4 00-0020531 2019 REG 281 378 2979.0
passing_tds interceptions sacks sack_yards sack_fumbles \
0 24 8.0 27.0 185.0 3
1 40 12.0 21.0 143.0 1
2 43 12.0 22.0 144.0 3
3 25 9.0 22.0 160.0 3
4 27 4.0 12.0 89.0 0
sack_fumbles_lost passing_air_yards passing_yards_after_catch \
0 1 4613.0 1863.0
1 0 5532.0 1810.0
2 2 5804.0 2534.0
3 2 5027.0 2292.0
4 0 2425.0 1495.0
passing_first_downs passing_epa passing_2pt_conversions pacr \
0 193.0 31.495537 1 14.739260
1 233.0 133.306174 0 13.480107
2 269.0 145.714884 0 16.026899
3 237.0 61.906270 2 16.329448
4 159.0 93.795577 0 13.487758
dakota carries rushing_yards rushing_tds rushing_fumbles \
0 1.374107 26 34.0 3 0.0
1 2.854643 30 6.0 3 3.0
2 2.457566 28 81.0 2 1.0
3 1.556584 29 -1.0 1 2.0
4 1.879901 9 -4.0 1 0.0
rushing_fumbles_lost rushing_first_downs rushing_epa \
0 0.0 8.0 0.831919
1 1.0 6.0 -18.186052
2 1.0 14.0 3.850479
3 2.0 5.0 -20.469321
4 0.0 2.0 2.232562
rushing_2pt_conversions receptions targets receiving_yards \
0 0 0 0 0.0
1 0 0 0 0.0
2 0 0 0 0.0
3 0 0 1 0.0
4 0 0 0 0.0
receiving_tds receiving_fumbles receiving_fumbles_lost \
0 0 0.0 0.0
1 0 0.0 0.0
2 0 0.0 0.0
3 0 0.0 0.0
4 0 0.0 0.0
receiving_air_yards receiving_yards_after_catch receiving_first_downs \
0 0.0 0.0 0.0
1 0.0 0.0 0.0
2 0.0 0.0 0.0
3 16.0 0.0 0.0
4 0.0 0.0 0.0
receiving_epa receiving_2pt_conversions racr target_share \
0 0.000000 0 0.0 0.000000
1 0.000000 0 0.0 0.000000
2 0.000000 0 0.0 0.000000
3 -4.726016 0 0.0 0.034483
4 0.000000 0 0.0 0.000000
air_yards_share wopr_x special_teams_tds fantasy_points \
0 0.000000 0.000000 0.0 263.68
1 0.000000 0.000000 0.0 337.92
2 0.000000 0.000000 0.0 374.74
3 0.061303 0.094636 0.0 271.66
4 0.000000 0.000000 0.0 224.76
fantasy_points_ppr games tgt_sh ay_sh yac_sh wopr_y ry_sh \
0 263.68 16 0.000000 0.0000 0.0 0.000000 0.0
1 337.92 16 0.000000 0.0000 0.0 0.000000 0.0
2 374.74 17 0.000000 0.0000 0.0 0.000000 0.0
3 271.66 17 0.001332 0.0031 0.0 0.004477 0.0
4 224.76 11 0.000000 0.0000 0.0 0.000000 0.0
rtd_sh rfd_sh rtdfd_sh dom w8dom yptmpa ppr_sh gsis_id \
0 0.0 0.0 0.0 0.0 0.0 0.0 0.178921 00-0019596
1 0.0 0.0 0.0 0.0 0.0 0.0 0.197091 00-0019596
2 0.0 0.0 0.0 0.0 0.0 0.0 0.195705 00-0019596
3 0.0 0.0 0.0 0.0 0.0 0.0 0.175012 00-0019596
4 0.0 0.0 0.0 0.0 0.0 0.0 0.190652 00-0020531
display_name
0 Tom Brady
1 Tom Brady
2 Tom Brady
3 Tom Brady
4 Drew Brees
import pandas as pd
# Assuming 'merged_df' exists from the last successful step.
print("\n--- STEP 5: Calculating Passer Rating ---")
# First, filter for QBs with a meaningful number of attempts.
qualified_df = merged_df[merged_df['attempts'] > 100].copy()
# Calculate the four components of the NFL passer rating formula
c = ((qualified_df['completions'] / qualified_df['attempts']) - 0.3) * 5
y = ((qualified_df['passing_yards'] / qualified_df['attempts']) - 3) * 0.25
t = (qualified_df['passing_tds'] / qualified_df['attempts']) * 20
i = 2.375 - ((qualified_df['interceptions'] / qualified_df['attempts']) * 25)
# The result of each component is capped between 0 and 2.375
c = c.clip(0, 2.375)
y = y.clip(0, 2.375)
t = t.clip(0, 2.375)
i = i.clip(0, 2.375)
# Final passer rating calculation
qualified_df['passer_rating'] = ((c + y + t + i) / 6) * 100
print("Success! Passer rating calculated and added as a new column.")
print("Here's a sample of the data with the new 'passer_rating' column:")
# Show the new column at the end
print(qualified_df[['display_name', 'season', 'attempts', 'passer_rating']].head())
--- STEP 5: Calculating Passer Rating --- Success! Passer rating calculated and added as a new column. Here's a sample of the data with the new 'passer_rating' column: display_name season attempts passer_rating 0 Tom Brady 2019 613 87.979201 1 Tom Brady 2020 610 102.172131 2 Tom Brady 2021 719 102.083333 3 Tom Brady 2022 733 90.725898 4 Drew Brees 2019 378 116.269841
import matplotlib.pyplot as plt
import seaborn as sns
# Assuming 'qualified_df' exists from our last successful step.
print("\n--- FINAL STEP: Visualizing the Top 20 QBs by Passing Yards ---")
# The list of the Top 20 QBs from your screenshot
target_qbs = [
'Lamar Jackson', 'Patrick Mahomes', 'Josh Allen',
'Derek Carr', 'Matthew Stafford', 'Justin Herbert',
'Jared Goff','Dak Prescott', 'Joe Burrow', 'Geno Smith',
'Russell Wilson', 'Jalen Hurts','Baker Mayfield', 'Aaron Rodgers',
'Kyler Murray', 'Tua Tagovailoa', 'Trevor Lawrence', 'Daniel Jones',
'Sam Darnold', 'Kirk Cousins'
]
# Filter our data to include only these specific players
found_qbs_df = qualified_df[qualified_df['display_name'].isin(target_qbs)]
# --- ✅ Final Visualization ---
plt.style.use('seaborn-v0_8-whitegrid')
plt.figure(figsize=(14, 8))
sns.lineplot(
data=found_qbs_df,
x='season',
y='passer_rating',
hue='display_name',
marker='o',
linewidth=2.5
)
plt.title('Top 20 QBs by Passing Yards: Passer Rating Trend (2019-2023)', fontsize=18, fontweight='bold')
plt.xlabel('Season', fontsize=12)
plt.ylabel('Calculated Passer Rating', fontsize=12)
plt.xticks(list(range(2019, 2024)))
plt.legend(title='Quarterback', bbox_to_anchor=(1.05, 1), loc='upper left')
plt.grid(True, which='both', linestyle='--', linewidth=0.5)
plt.tight_layout()
plt.show()
print("\n--- Final Data Used for Plotting ---")
print(found_qbs_df[['display_name', 'season', 'passer_rating']].sort_values(by=['display_name', 'season']))
--- FINAL STEP: Visualizing the Top 20 QBs by Passing Yards ---
--- Final Data Used for Plotting ---
display_name season passer_rating
25 Aaron Rodgers 2019 95.390305
26 Aaron Rodgers 2020 121.530418
27 Aaron Rodgers 2021 111.899718
28 Aaron Rodgers 2022 91.067036
1776 Baker Mayfield 2019 79.832555
1777 Baker Mayfield 2020 95.901920
1778 Baker Mayfield 2021 83.124003
1779 Baker Mayfield 2022 78.986318
1780 Baker Mayfield 2023 94.574499
933 Dak Prescott 2019 99.692394
934 Dak Prescott 2020 99.605856
935 Dak Prescott 2021 104.215604
936 Dak Prescott 2022 91.127327
937 Dak Prescott 2023 105.868644
2233 Daniel Jones 2019 87.658860
2234 Daniel Jones 2020 80.422247
2235 Daniel Jones 2021 84.816482
2236 Daniel Jones 2022 92.522952
2237 Daniel Jones 2023 70.546875
468 Derek Carr 2019 100.800032
469 Derek Carr 2020 101.398291
470 Derek Carr 2021 93.962993
471 Derek Carr 2022 86.263280
472 Derek Carr 2023 97.718978
391 Geno Smith 2022 100.874126
392 Geno Smith 2023 92.130094
2550 Jalen Hurts 2020 77.561937
2551 Jalen Hurts 2021 87.191358
2552 Jalen Hurts 2022 101.548913
2553 Jalen Hurts 2023 89.118649
951 Jared Goff 2019 86.468317
952 Jared Goff 2020 90.036232
953 Jared Goff 2021 91.531714
954 Jared Goff 2022 99.315020
955 Jared Goff 2023 97.916667
2603 Joe Burrow 2020 89.830858
2604 Joe Burrow 2021 108.261218
2605 Joe Burrow 2022 100.783828
2606 Joe Burrow 2023 90.998858
1781 Josh Allen 2019 85.317245
1782 Josh Allen 2020 107.153263
1783 Josh Allen 2021 92.169763
1784 Josh Allen 2022 96.608613
1785 Josh Allen 2023 92.224381
2515 Justin Herbert 2020 98.273810
2516 Justin Herbert 2021 97.656250
2517 Justin Herbert 2022 93.159871
2518 Justin Herbert 2023 93.220029
245 Kirk Cousins 2019 107.404279
246 Kirk Cousins 2020 104.998385
247 Kirk Cousins 2021 103.100862
248 Kirk Cousins 2022 92.460472
249 Kirk Cousins 2023 103.784834
1936 Kyler Murray 2019 87.673933
1937 Kyler Murray 2020 94.310036
1938 Kyler Murray 2021 100.550069
1939 Kyler Murray 2022 87.211538
1940 Kyler Murray 2023 89.443408
1719 Lamar Jackson 2019 113.336451
1720 Lamar Jackson 2020 99.346188
1721 Lamar Jackson 2021 86.965532
1722 Lamar Jackson 2022 91.065951
1723 Lamar Jackson 2023 102.721554
80 Matthew Stafford 2019 106.020905
81 Matthew Stafford 2020 96.338384
82 Matthew Stafford 2021 102.929146
83 Matthew Stafford 2022 87.438119
84 Matthew Stafford 2023 92.494402
1256 Patrick Mahomes 2019 105.311639
1257 Patrick Mahomes 2020 108.234127
1258 Patrick Mahomes 2021 98.454914
1259 Patrick Mahomes 2022 105.156893
1260 Patrick Mahomes 2023 92.556533
223 Russell Wilson 2019 106.330749
224 Russell Wilson 2020 105.070191
225 Russell Wilson 2021 103.052083
226 Russell Wilson 2022 84.415977
227 Russell Wilson 2023 98.000559
1792 Sam Darnold 2019 84.320673
1793 Sam Darnold 2020 72.687729
1794 Sam Darnold 2021 71.941708
1795 Sam Darnold 2022 92.648810
2794 Trevor Lawrence 2021 71.857697
2795 Trevor Lawrence 2022 95.212614
2796 Trevor Lawrence 2023 88.489953
2378 Tua Tagovailoa 2020 87.054598
2379 Tua Tagovailoa 2021 90.066581
2380 Tua Tagovailoa 2022 105.500000
2381 Tua Tagovailoa 2023 101.071429
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
# Assuming 'found_qbs_df' and 'qualified_df' exist from our last steps.
print("\n--- Adding a League Average Line for Elite Context ---")
# Calculate the average passer rating across all qualified QBs for these years
league_average_rating = qualified_df['passer_rating'].mean()
# --- Re-create the FacetGrid with the average line ---
g = sns.FacetGrid(
data=found_qbs_df,
col='display_name',
col_wrap=5,
height=2.5,
aspect=1.5
)
# For each small chart, plot the line AND the average line
g.map(sns.lineplot, 'season', 'passer_rating', marker='o', color='royalblue')
g.map(plt.axhline, y=league_average_rating, color='red', linestyle='--', label='League Average')
# Add a clean title and set subplot titles
g.fig.suptitle('QB Passer Rating vs. League Average (2019-2023)', y=1.03, fontsize=18, fontweight='bold')
g.set_titles("{col_name}")
g.tight_layout(w_pad=1)
# Add a single legend for the entire figure
plt.legend(bbox_to_anchor=(1.15, 6.25), loc='upper right')
plt.show()
--- Adding a League Average Line for Elite Context ---
3.1 Machine Learning: QB Archetype Analysis¶
Identifying QB Styles with KMeans Clustering¶
Analysis:
The previous exploratory analysis showed us what individual quarterbacks did over time. This section takes the analysis a step further by using machine learning to discover who these quarterbacks are as players. The goal is to move beyond simple rankings and identify distinct, data-driven "archetypes" or "styles of play."
To accomplish this, we will use KMeans clustering, a popular unsupervised machine learning algorithm. The algorithm will group quarterbacks based on their statistical similarities across several key performance metrics.
Methodology:
The model will be trained on a set of rate-based features that define a quarterback's passing style, normalized over the 2019-2023 period. The selected features are:
- Completion Percentage (
completion_pct): A measure of accuracy. - Touchdown Rate (
td_rate): A measure of scoring efficiency. - Interception Rate (
int_rate): A measure of risk-aversion. - Yards Per Attempt (
yards_per_attempt): A measure of aggressiveness and downfield passing.
By clustering on these dimensions, we can uncover groups of QBs who, regardless of their name or team, play a statistically similar game.
import pandas as pd
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler
import matplotlib.pyplot as plt
# Assuming 'qualified_df' is our master DataFrame from the previous steps.
# We need one row per player, so let's aggregate the stats over the 2019-2023 period.
player_agg_stats = qualified_df.groupby(['player_id', 'display_name']).agg(
attempts=('attempts', 'sum'),
completions=('completions', 'sum'),
passing_yards=('passing_yards', 'sum'),
passing_tds=('passing_tds', 'sum'),
interceptions=('interceptions', 'sum'),
passer_rating=('passer_rating', 'mean') # Use the mean rating over the period
).reset_index()
# --- Feature Selection ---
# Let's define a QB's style by these key metrics.
# We'll calculate rates to normalize for playing time.
player_agg_stats['completion_pct'] = player_agg_stats['completions'] / player_agg_stats['attempts']
player_agg_stats['td_rate'] = player_agg_stats['passing_tds'] / player_agg_stats['attempts']
player_agg_stats['int_rate'] = player_agg_stats['interceptions'] / player_agg_stats['attempts']
player_agg_stats['yards_per_attempt'] = player_agg_stats['passing_yards'] / player_agg_stats['attempts']
# Select the final features for our model
features = ['completion_pct', 'td_rate', 'int_rate', 'yards_per_attempt']
X = player_agg_stats[features]
print("--- Step 1: Feature Selection Complete ---")
print("Selected features for clustering:")
print(X.head())
# --- Step 2: Data Scaling ---
# Scale the data so that each feature contributes equally to the distance calculation.
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
print("\n--- Step 2: Data Scaling Complete ---")
print("Data has been scaled and is ready for clustering.")
--- Step 1: Feature Selection Complete --- Selected features for clustering: completion_pct td_rate int_rate yards_per_attempt 0 0.653832 0.049346 0.015327 6.990654 1 0.723958 0.066406 0.013021 7.709635 2 0.619048 0.040816 0.034014 7.088435 3 0.650453 0.045342 0.016488 6.218467 4 0.669312 0.041446 0.027337 7.746032 --- Step 2: Data Scaling Complete --- Data has been scaled and is ready for clustering.
from sklearn.cluster import KMeans
import matplotlib.pyplot as plt
import numpy as np
# Assuming 'X_scaled' is our scaled feature data from the last step.
# We'll also need 'player_agg_stats' later for labeling.
print("--- Step 3: Finding the Optimal Number of Clusters (k) ---")
# Calculate inertia for a range of k values
inertia = []
K = range(1, 11) # Test k from 1 to 10 clusters
for k in K:
kmeans_model = KMeans(n_clusters=k, random_state=42, n_init=10)
kmeans_model.fit(X_scaled)
inertia.append(kmeans_model.inertia_)
# Plot the Elbow Method chart
plt.figure(figsize=(10, 6))
plt.plot(K, inertia, 'bx-')
plt.xlabel('k (Number of Clusters)')
plt.ylabel('Inertia')
plt.title('The Elbow Method for Finding the Optimal k')
plt.show()
--- Step 3: Finding the Optimal Number of Clusters (k) ---
3.2 Interactive Archetype Visualization¶
import pandas as pd
from sklearn.cluster import KMeans
# Assuming 'X_scaled' is our scaled feature data and 'player_agg_stats' has our raw stats.
print("--- Step 4: Fitting KMeans with k=3 and Analyzing Clusters ---")
# Fit the KMeans model with our chosen k=3
kmeans = KMeans(n_clusters=3, random_state=42, n_init=10)
kmeans.fit(X_scaled)
# Assign the cluster labels back to our main DataFrame
player_agg_stats['cluster'] = kmeans.labels_
# --- Cluster Analysis ---
# Now, let's analyze the characteristics of each cluster by looking at their average stats.
# This is how we will define our QB archetypes.
cluster_analysis = player_agg_stats.groupby('cluster')[features].mean()
print("\n--- Cluster Analysis (Average Stats per Cluster) ---")
print(cluster_analysis)
# Let's also see which players are in each cluster
print("\n--- Players per Cluster ---")
for i in range(3):
print(f"\n--- Cluster {i} Players ---")
cluster_players = player_agg_stats[player_agg_stats['cluster'] == i]['display_name'].tolist()
print(", ".join(cluster_players))
--- Step 4: Fitting KMeans with k=3 and Analyzing Clusters ---
--- Cluster Analysis (Average Stats per Cluster) ---
completion_pct td_rate int_rate yards_per_attempt
cluster
0 0.625132 0.032406 0.025991 6.510114
1 0.660790 0.050763 0.021048 7.589247
2 0.555511 0.018415 0.040232 5.425022
--- Players per Cluster ---
--- Cluster 0 Players ---
Eli Manning, Ben Roethlisberger, Alex Smith, Joe Flacco, Colt McCoy, Cam Newton, Andy Dalton, Tyrod Taylor, Nick Foles, Taylor Heinicke, Trevor Siemian, Brandon Allen, Jeff Driskel, Carson Wentz, Jacoby Brissett, Taysom Hill, Cooper Rush, Mitchell Trubisky, Joshua Dobbs, Mike White, Kyle Allen, Mason Rudolph, Baker Mayfield, Sam Darnold, Dwayne Haskins, Easton Stick, Gardner Minshew, Devlin Hodges, Drew Lock, Daniel Jones, Tyler Huntley, Sam Ehlinger, Davis Mills, Justin Fields, Trevor Lawrence, Mac Jones, Zach Wilson, Sam Howell, Kenny Pickett, Desmond Ridder, Tyson Bagent, Tommy DeVito, Aidan O'Connell, Bryce Young, Will Levis
--- Cluster 1 Players ---
Tom Brady, Drew Brees, Philip Rivers, Aaron Rodgers, Ryan Fitzpatrick, Matt Ryan, Matthew Stafford, Case Keenum, Russell Wilson, Kirk Cousins, Ryan Tannehill, Geno Smith, Teddy Bridgewater, Derek Carr, Jimmy Garoppolo, Jameis Winston, Marcus Mariota, Dak Prescott, Jared Goff, Nick Mullens, Deshaun Watson, Patrick Mahomes, C.J. Beathard, Lamar Jackson, Josh Allen, Jake Browning, Kyler Murray, Tua Tagovailoa, Jordan Love, Justin Herbert, Jalen Hurts, Joe Burrow, Brock Purdy, C.J. Stroud
--- Cluster 2 Players ---
Mike Glennon, PJ Walker, Josh Rosen, David Blough, Jake Luton, Skylar Thompson, Bailey Zappe, Dorian Thompson-Robinson
import plotly.express as px
import pandas as pd
# Assuming 'player_agg_stats' is our DataFrame with stats and cluster labels from the last step.
print("--- Creating an Interactive QB Archetype Chart with Plotly ---")
# First, let's map our cluster numbers to the descriptive names we came up with.
# This will make our chart's legend much more readable.
archetype_map = {
1: 'Elite Quarterbacks',
0: 'The League Core',
2: 'Struggling & Backups'
}
player_agg_stats['archetype'] = player_agg_stats['cluster'].map(archetype_map)
# --- Create the Interactive Scatter Plot ---
fig = px.scatter(
data_frame=player_agg_stats,
x='completion_pct',
y='yards_per_attempt',
color='archetype', # Color points by the archetype name
size='td_rate', # Size points by their touchdown rate
hover_name='display_name', # Show the player's name on hover
hover_data={ # Define what extra data to show on hover
'completion_pct': ':.2%', # Format as percentage
'yards_per_attempt': ':.2f',
'td_rate': ':.2%',
'int_rate': ':.2%',
'archetype': False # Hide this from the hover tooltip
},
color_discrete_map={ # Assign specific colors to our archetypes
'Elite Quarterbacks': 'gold',
'The League Core': 'royalblue',
'Struggling & Backups': 'darkred'
},
title='Interactive QB Archetype Map (2019-2023)',
labels={ # Clean up axis labels
'completion_pct': 'Completion Percentage (Accuracy)',
'yards_per_attempt': 'Yards Per Attempt (Aggressiveness)',
'td_rate': 'Touchdown Rate'
}
)
# --- Update Layout for a Professional Look ---
fig.update_layout(
legend_title_text='QB Archetype',
title_font_size=22,
xaxis=dict(tickformat='.1%') # Format x-axis ticks as percentages
)
# Show the interactive figure
fig.show(config={'displayModeBar': False}, renderer='notebook')
--- Creating an Interactive QB Archetype Chart with Plotly ---
Interactive Visualization:
To allow for a deeper exploration of these QB archetypes, the clusters are visualized on an interactive bubble chart. This format packs multiple dimensions of data into a single, intuitive plot:
- X-axis (Accuracy): Completion Percentage
- Y-axis (Aggressiveness): Yards Per Attempt
- Color (Archetype): The three distinct QB clusters.
- Bubble Size (Scoring Prowess): Touchdown Rate (larger bubbles indicate a higher TD rate).
How to Use This Chart:
- Hover: Mouse over any bubble to see the specific quarterback's name and their key statistics.
- Zoom & Pan: Use your mouse or the toolbar to zoom in on dense areas, like the "League Core" cluster, to differentiate individual players.
- Filter: Click on the archetype names in the legend to toggle them on or off, making it easy to isolate and compare groups.
This visualization clearly illustrates the trade-offs between different play styles. Notice how the Elite Quarterbacks (gold) not only occupy the top-right quadrant (high accuracy and aggressiveness) but also tend to have larger bubbles, indicating they are the most efficient at scoring touchdowns.
4. Predictive Modeling : Defining Success¶
# Add before each major section
print("="*80)
print("SECTION 3: PREDICTIVE MODELING")
print("="*80)
print(f"Note: Analysis based on {len(df_clean_v1)} qualified QBs with 75+ pass attempts")
print(f"Statistical confidence: High (large sample size)\n")
================================================================================ SECTION 3: PREDICTIVE MODELING ================================================================================ Note: Analysis based on 64 qualified QBs with 75+ pass attempts Statistical confidence: High (large sample size)
import numpy as np
# We need to identify if a play resulted in a first down.
# The 'desc' column often contains phrases like '1ST DOWN' or '1st down'.
# Let's create a boolean flag for this.
pass_plays_df['first_down_gained'] = pass_plays_df['desc'].str.contains('1ST DOWN', na=False, case=True)
# Our target variable, 'is_successful_pass', will be 1 if the play was a
# touchdown OR resulted in a first down, and 0 otherwise.
# We already have the 'is_touchdown' and 'first_down_gained' columns.
conditions = [
(pass_plays_df['is_touchdown'] == True),
(pass_plays_df['first_down_gained'] == True)
]
choices = [1, 1] # If either condition is met, the choice is 1 (successful)
pass_plays_df['is_successful_pass'] = np.select(conditions, choices, default=0) # If neither, default is 0
# Let's check our work to see how many successful vs. unsuccessful passes we have
print("Distribution of Pass Play Outcomes:")
print(pass_plays_df['is_successful_pass'].value_counts(normalize=True))
Distribution of Pass Play Outcomes: 0 0.953832 1 0.046168 Name: is_successful_pass, dtype: float64
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report
# --- Steps 1 and 2 are the same: Prepare your X and y ---
features = ['down', 'ydstogo', 'yardline_100', 'shotgun', 'passer_name']
target = 'is_successful_pass'
model_df = pass_plays_df[features + [target]].dropna()
X = pd.get_dummies(model_df[features], columns=['passer_name'], drop_first=True)
y = model_df[target]
# --- Step 3 is the same: Create Training and Testing sets ---
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# --- Step 4: Train an IMPROVED Model ---
# THE ONLY CHANGE IS HERE: We add class_weight='balanced'
model_balanced = LogisticRegression(max_iter=1000, class_weight='balanced')
# Train the new model
model_balanced.fit(X_train, y_train)
# Make predictions with the new model
predictions_balanced = model_balanced.predict(X_test)
# --- Step 5: Evaluate the new model ---
print("--- Detailed Classification Report (Balanced Model) ---")
report_balanced = classification_report(y_test, predictions_balanced, target_names=['Unsuccessful (0)', 'Successful (1)'])
print(report_balanced)
--- Detailed Classification Report (Balanced Model) ---
precision recall f1-score support
Unsuccessful (0) 0.99 0.77 0.86 3577
Successful (1) 0.14 0.80 0.24 171
accuracy 0.77 3748
macro avg 0.56 0.78 0.55 3748
weighted avg 0.95 0.77 0.84 3748
Addressing Class Imbalance: Building a Balanced Predictive Model¶
One critical challenge in predicting NFL pass success is class imbalance—unsuccessful passes significantly outnumber successful ones (TDs and first downs). This imbalance can cause standard models to be biased toward predicting the majority class, leading to poor recall for successful passes.
To address this, we implemented a balanced logistic regression model that gives equal weight to both outcomes during training.
Model Architecture:¶
Features Selected:
down: Current down (1st, 2nd, 3rd, or 4th)ydstogo: Yards needed for a first downyardline_100: Field position (yards from opponent's end zone)shotgun: Binary indicator for shotgun formationpasser_name: Quarterback identity (one-hot encoded)
Target Variable:
is_successful_pass: Binary (1 = TD or First Down, 0 = Otherwise)
Key Innovation: Class Weight Balancing¶
The critical enhancement in this model is the class_weight='balanced' parameter. This automatically adjusts the model to:
- Give equal importance to successful and unsuccessful passes
- Prevent the model from simply predicting "unsuccessful" for most plays
- Improve identification of factors that lead to successful outcomes
Why This Matters:¶
In football analytics, identifying the 30-40% of plays that succeed is more valuable than correctly predicting the 60-70% that fail. This balanced approach ensures our model learns the subtle patterns that differentiate game-changing plays from routine incompletions.
Business Impact: A model with higher recall for successful plays can better inform critical decisions like 4th down attempts or two-minute drill play calling.
4.1 Advanced Modeling: Production-Ready ML Pipeline¶
Moving from proof-of-concept to deployment-ready code, we now implement industry best practices with a full preprocessing pipeline that handles mixed data types, prevents data leakage, and ensures reproducible results.
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report, confusion_matrix
import pandas as pd
# Assuming 'pass_plays_df' is already loaded and cleaned as in your notebook.
# 1. Define the target variable (y) and features (X)
target = 'success'
features = ['down', 'ydstogo', 'yardline_100', 'score_differential', 'qtr']
X = pass_plays_df[features]
y = pass_plays_df[target]
# 2. Identify categorical and numerical features
categorical_features = ['down', 'qtr']
numerical_features = ['ydstogo', 'yardline_100', 'score_differential']
# 3. Create preprocessing pipelines for numerical and categorical features
numerical_transformer = StandardScaler()
categorical_transformer = OneHotEncoder(handle_unknown='ignore')
# 4. Create a preprocessor object using ColumnTransformer
preprocessor = ColumnTransformer(
transformers=[
('num', numerical_transformer, numerical_features),
('cat', categorical_transformer, categorical_features)
])
# 5. Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, stratify=y)
print("Data preparation is complete. We are now ready to build and train the model.")
print("X_train shape:", X_train.shape)
print("X_test shape:", X_test.shape)
print("y_train shape:", y_train.shape)
print("y_test shape:", y_test.shape)
Data preparation is complete. We are now ready to build and train the model. X_train shape: (14988, 5) X_test shape: (3748, 5) y_train shape: (14988,) y_test shape: (3748,)
# Create the pipeline
# This chains our preprocessor and the logistic regression model together
model_pipeline = Pipeline(steps=[('preprocessor', preprocessor),
('classifier', LogisticRegression(random_state=42))])
# Train the model
model_pipeline.fit(X_train, y_train)
print("Model training complete.")
Model training complete.
Understanding Model Performance¶
In NFL play prediction:
- Random baseline: 50% accuracy
- Majority class baseline: ~60% (predicting all plays fail)
- Our model: 57% with balanced classes
- This represents a meaningful improvement in identifying successful plays
Predictive Modeling: Establishing a Baseline¶
To address the project critique and strengthen the analysis, we will build a predictive model. The goal is to predict the probability of a successful pass based on the situation of the play.
- Defining the Target and Features:
Target Variable (y): success (a binary 1 for a successful play, 0 for a failure).
Predictor Variables (X): A foundational set of situational features including down, ydstogo, yardline_100, score_differential, and qtr.
- Model Selection: We will begin with Logistic Regression. This model is an excellent choice for a baseline because it is highly interpretable, allowing us to understand which situational factors most influence the outcome of a play.
The performance of this initial model will serve as our benchmark. Every subsequent change will be measured against this baseline to determine if we are genuinely improving our ability to predict pass success.
# Make predictions on the test data
y_pred = model_pipeline.predict(X_test)
# Generate and print the confusion matrix
print("\nConfusion Matrix:")
# Note: We can use a more visual version later if you'd like!
print(confusion_matrix(y_test, y_pred))
# Generate and print the classification report
print("\nClassification Report:")
print(classification_report(y_test, y_pred, target_names=['Unsuccessful Pass', 'Successful Pass']))
Confusion Matrix:
[[1293 689]
[ 958 808]]
Classification Report:
precision recall f1-score support
Unsuccessful Pass 0.57 0.65 0.61 1982
Successful Pass 0.54 0.46 0.50 1766
accuracy 0.56 3748
macro avg 0.56 0.55 0.55 3748
weighted avg 0.56 0.56 0.56 3748
Iteration 2: Improving the Model with Feature Engineering¶
Our baseline model achieved an accuracy of 56%. To improve upon this, we will engage in feature engineering—creating more descriptive features from our existing data to provide the model with more context. A better model isn't just about a more complex algorithm; it's about giving a simple algorithm better data.
We created three new features to capture high-leverage game situations:
is_in_redzone: A binary flag for plays inside the opponent's 20-yard line.
is_two_minute_drill: A binary flag for plays in the final two minutes of either half.
down_x_distance: An interaction term combining down and ydstogo to represent the combined situational difficulty.
By re-training our logistic regression model with these additional features, we aim to improve its predictive power, particularly its recall—the ability to correctly identify successful passes.
import numpy as np
# Make sure your DataFrame is loaded as pass_plays_df
# 1. Create 'is_in_redzone'
pass_plays_df['is_in_redzone'] = (pass_plays_df['yardline_100'] <= 20).astype(int)
# 2. Create 'is_two_minute_drill' (CORRECTED LINE)
# We use 'half_seconds_remaining' which is the correct column name from your file.
pass_plays_df['is_two_minute_drill'] = ((pass_plays_df['qtr'].isin([2, 4])) & (pass_plays_df['half_seconds_remaining'] <= 120)).astype(int)
# 3. Create 'down_x_distance' interaction feature
pass_plays_df['down_x_distance'] = pass_plays_df['down'] * pass_plays_df['ydstogo']
print("Feature engineering complete. New columns added to the DataFrame:")
print(pass_plays_df[['is_in_redzone', 'is_two_minute_drill', 'down_x_distance']].head())
Feature engineering complete. New columns added to the DataFrame: is_in_redzone is_two_minute_drill down_x_distance 3 0 0 14.0 5 0 0 10.0 6 0 0 20.0 7 0 0 10.0 8 0 0 18.0
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report, confusion_matrix
import pandas as pd
import numpy as np
# --- Feature Engineering with Corrected Column Name ---
pass_plays_df['is_in_redzone'] = (pass_plays_df['yardline_100'] <= 20).astype(int)
pass_plays_df['is_two_minute_drill'] = ((pass_plays_df['qtr'].isin([2, 4])) & (pass_plays_df['half_seconds_remaining'] <= 120)).astype(int)
pass_plays_df['down_x_distance'] = pass_plays_df['down'] * pass_plays_df['ydstogo']
# --- Updated Feature Definitions ---
target = 'success'
features = [
'down', 'ydstogo', 'yardline_100', 'score_differential', 'qtr', # Original
'is_in_redzone', 'is_two_minute_drill', 'down_x_distance' # New
]
X = pass_plays_df[features]
y = pass_plays_df[target]
# --- Updated Preprocessing Lists ---
categorical_features = ['down', 'qtr', 'is_in_redzone', 'is_two_minute_drill']
numerical_features = ['ydstogo', 'yardline_100', 'score_differential', 'down_x_distance']
# --- The Rest of the Pipeline (Remains the Same) ---
preprocessor = ColumnTransformer(
transformers=[
('num', StandardScaler(), numerical_features),
('cat', OneHotEncoder(handle_unknown='ignore'), categorical_features)])
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, stratify=y)
model_pipeline = Pipeline(steps=[('preprocessor', preprocessor),
('classifier', LogisticRegression(random_state=42, max_iter=1000))])
# Train the model
model_pipeline.fit(X_train, y_train)
# --- Evaluation ---
y_pred = model_pipeline.predict(X_test)
print("\n--- Model Results with Engineered Features ---")
print("\nConfusion Matrix:")
print(confusion_matrix(y_test, y_pred))
print("\nClassification Report:")
print(classification_report(y_test, y_pred, target_names=['Unsuccessful Pass', 'Successful Pass']))
--- Model Results with Engineered Features ---
Confusion Matrix:
[[1149 833]
[ 795 971]]
Classification Report:
precision recall f1-score support
Unsuccessful Pass 0.59 0.58 0.59 1982
Successful Pass 0.54 0.55 0.54 1766
accuracy 0.57 3748
macro avg 0.56 0.56 0.56 3748
weighted avg 0.57 0.57 0.57 3748
Iteration 3: Comparing with an Advanced Model¶
Our feature-engineered logistic regression model showed significant improvement, increasing the recall for successful passes from 46% to 55%. The final step in our modeling process is to test if a more complex, powerful algorithm can outperform our improved model.
We will use XGBoost (Extreme Gradient Boosting), an industry-standard algorithm known for its high performance on tabular data. XGBoost works by building a series of decision trees sequentially, with each new tree correcting the errors of the previous ones.
This step serves two purposes:
To see if we can achieve even higher predictive accuracy.
To demonstrate a rigorous evaluation process by comparing our interpretable model against a more complex "black box" model.
The outcome will determine our final, recommended model for this analysis.
from xgboost import XGBClassifier
# We use the same data and preprocessor as before.
# The only thing we change is the model itself.
# 1. Create the XGBoost model pipeline
xgb_pipeline = Pipeline(steps=[('preprocessor', preprocessor),
('classifier', XGBClassifier(random_state=42, use_label_encoder=False, eval_metric='logloss'))])
# 2. Train the XGBoost model
xgb_pipeline.fit(X_train, y_train)
# 3. Evaluate the new model
print("\n--- XGBoost Model Results ---")
y_pred_xgb = xgb_pipeline.predict(X_test)
print("\nConfusion Matrix:")
print(confusion_matrix(y_test, y_pred_xgb))
print("\nClassification Report:")
print(classification_report(y_test, y_pred_xgb, target_names=['Unsuccessful Pass', 'Successful Pass']))
--- XGBoost Model Results ---
Confusion Matrix:
[[1118 864]
[ 861 905]]
Classification Report:
precision recall f1-score support
Unsuccessful Pass 0.56 0.56 0.56 1982
Successful Pass 0.51 0.51 0.51 1766
accuracy 0.54 3748
macro avg 0.54 0.54 0.54 3748
weighted avg 0.54 0.54 0.54 3748
/Users/sov-t/.pyenv/versions/3.11.9/lib/python3.11/site-packages/xgboost/training.py:183: UserWarning:
[14:31:11] WARNING: /Users/runner/work/xgboost/xgboost/src/learner.cc:738:
Parameters: { "use_label_encoder" } are not used.
Predictive Modeling: Findings and Conclusion¶
Our goal was to build a robust model to predict the probability of a successful pass. We followed a structured, iterative process: establishing a baseline, improving it with feature engineering, and comparing it against a more complex algorithm.
Summary of Model Performance:
The table below summarizes the performance of the three models we developed. The key metric for evaluation was recall for "Successful Pass," as our goal was to maximize the model's ability to identify successful plays.
| Metric (for "Successful Pass") | Model 1 (Baseline LR) | Model 2 (LR + Features) | Model 3 (XGBoost) |
|---|---|---|---|
| Accuracy | 56% | 57% | 54% |
| Recall | 46% | 55% | 51% |
| Precision | 54% | 54% | 51% |
| F1-Score | 50% | 54% | 51% |
Key Insights:
Feature Engineering was the Key Driver of Improvement: The most significant performance gain came from Model 2, where we engineered new features to provide more game context. Adding
is_in_redzone,is_two_minute_drill, anddown_x_distanceincreased the model's recall by 9 percentage points—a substantial improvement over the baseline.Complexity Does Not Guarantee Better Performance: The more complex XGBoost model (Model 3) did not outperform our improved logistic regression model. This is a critical finding, demonstrating that a well-thought-out, simpler model with strong features can be more effective than a "black box" algorithm.
Final Model Selection:
Based on these results, we select the feature-engineered Logistic Regression model (Model 2) as our final model. It provides the best predictive performance while remaining highly interpretable, offering the ideal balance for this analysis. This methodical approach successfully addressed the feedback from the project critique and strengthened the overall analysis.
5. Synthesis & Recommendations¶
5.1 Key Findings & Statistical Limitations¶
🎯 Key Findings:¶
Situational Excellence is a Key Predictor: While 4th quarter performance is important, a QB's ability to improve their completion percentage and passer rating from early downs to 3rd Down is a more reliable signature of top-tier talent.
Elite Performance Requires Longevity: The time-series analysis proves that greatness is not a single-season event. The NFL's best (e.g., Mahomes, Brady) consistently perform well above the league average year after year, separating them from players with more volatile career arcs.
Data Defines Three QB Archetypes: Our KMeans clustering model successfully identified three distinct groups:
- Elite Quarterbacks: A small group defined by high accuracy and high downfield aggressiveness.
- The League Core: A large group of competent starters, journeymen, and developing players with balanced but less potent statistical profiles.
- Struggling & Backups: A group characterized by low efficiency and a higher rate of turnovers.
Hidden Value Exists: Players like Geno Smith, whose traditional metrics may have been overlooked, show clear "Elite" tier performance in specific seasons, proving that data can uncover undervalued assets.
⚖️ Statistical Limitations:¶
- Situational Rarity: "Clutch" situations, by definition, represent a small percentage of total plays, which can lead to smaller sample sizes for specific metrics.
- The "Team Game" Variable: This analysis is QB-centric and does not statistically control for crucial factors like offensive line quality, receiver talent, or coaching schemes.
- Clustering Model Dependency: The QB archetypes are dependent on the features chosen for the model. A different set of input features could result in different cluster formations.
- Single-Season Snapshot for Clustering: The current clustering model aggregates five years of data. A future version could analyze season-by-season movement between archetypes.
5.2 Conclusion¶
📊 Analysis and Visualizations Summary¶
Our multi-dimensional analysis successfully moved beyond traditional QB evaluation. We quantified clutch performance and demonstrated that true value lies in sustained, high-leverage situations.
- Visual Insights: Through visualizations like the Small Multiples Trend Chart and the Interactive QB Archetype Map, we created clear, data-dense views that stratify player performance and style.
- Key Discovery: Elite status is not just about raw totals but about when and how a quarterback performs. The combination of situational data, time-series trends, and machine learning provides a holistic evaluation framework.
🤖 Machine Learning Insights¶
The application of unsupervised learning was a resounding success, providing the project's most powerful insights:
- Cluster Separation: The KMeans algorithm clearly identified three statistically significant and interpretable QB archetypes.
- The "Elite" Profile: The model confirmed that the top quarterbacks are not one-dimensional. They master the rare combination of high accuracy (high
completion_pct) and high aggressiveness (highyards_per_attempt), which separates them from the rest of the league. - Practical Application: This clustering model can be used as a powerful scouting tool to profile college prospects or identify undervalued free agents who fit a specific team scheme.
💡 Overall Data Insights¶
- Performance Under Pressure is Quantifiable: We can measure and visualize a QB's ability to elevate their game.
- Consistency is King: Elite Status Requires Sustained High Performance.
- Archetypes, Not Just Rankings: Understanding a QB's style (Elite, League Core, etc.) is more valuable than a simple linear ranking.
- Data Uncovers Opportunity: By looking beyond the box score, teams can find players who deliver disproportionate value when it matters most.
🏈 Business Recommendations for General Managers¶
Immediate Actions:¶
- Scouting & Draft Strategy: Profile draft prospects against our "Elite" archetype. Look for players who exhibit both high accuracy and high aggressiveness, not just one or the other.
- Contract Negotiations: Use our archetype analysis to identify undervalued free agents. A player who fits "The League Core" but shows flashes of "Elite" play on 3rd downs is a prime target.
- Game Strategy: Adjust play-calling to match a QB's archetype. For a "League Core" QB, focus on high-percentage throws. For an "Elite" QB, be more aggressive with downfield concepts.
Strategic Considerations:¶
- Team Building: An "Elite" quarterback can elevate an entire offense. A "League Core" quarterback requires a stronger supporting cast to succeed. Roster construction should reflect the QB archetype.
- Market Inefficiency: The current market may overpay for "gunslingers" with high yardage but low efficiency. Our model identifies players who provide better ROI.
- Competitive Advantage: Integrating this multi-faceted analytical approach into your evaluation process could provide a significant edge in talent acquisition and on-field strategy.
🚀 Project Improvements & Future Work¶
Next Phase Enhancements:¶
College Football Integration
- Import CFB play-by-play data
- Track clutch performance from college → NFL transition
- Build "NFL Clutch Readiness Score" for draft prospects
Financial Analysis Layer
# Pseudocode for future implementation clutch_value = (clutch_rating - league_avg) * wins_added * revenue_per_win contract_efficiency = clutch_value / annual_salary